Table of Contents

Pretraining

Overviews

See also Language Model - Overviews.

Key and Early Papers

For a history, see section 2.4 of Qiu 2020 or the related work in the GPT-2 paper.

Contextualized Pretrained Models

Papers sorted chronologically. For a large list of pre-trained models, see here.

Table of Large Models

List of popular models in chronological order. See also the list of Large Language Models.

Model Year Type Parameters Training Data Objective Public? Notes Link
BERT 2018 Dec
T5 2019 Enc-Dec 11B C4 Yes github
BART Enc-Dec

Fine-Tuning Methods

Moved to Fine-Tuning.

Other Papers

Complex Pre-training Methods

Taxonomy of Pretraining Methods


Figure from Qiu 2020.


Figure from Liu 2020.


Figure from Qiu 2020.


Figure from Liu 2020. Key:

Properties of Pretrained Models

Pretraining Methodology

See also scaling laws.

Amount, Selection and Cleaning of Pretraining Data

Pretraining On An Academic Budget

Papers or projects where people have pretrained LLMs with academic compute budgets.

Software

Related Pages