User Tools

Site Tools


nlp:pretraining

Pretraining

Overviews

Key and Early Papers

Contextualized Pretrained Models

Papers sorted chronologically. For a large list of pre-trained models, see here.

Table of Large Models

List of popular models in chronological order. See also the list of Large Language Models.

Model Year Type Parameters Training Data Objective Public? Notes Link
BERT 2018 Dec
T5 2019 Enc-Dec 11B C4 Yes github
BART Enc-Dec

Fine-Tuning Methods

Moved to Fine-Tuning.

Other Papers

Complex Pre-training Methods

Taxonomy of Pretraining Methods


Figure from Qiu 2020.


Figure from Liu 2020.


Figure from Qiu 2020.


Figure from Liu 2020. Key:

  • LM: language modeling
  • MLM: masked language modeling
  • NSP: next sentence prediction
  • SOP: sentence order prediction
  • Discriminator (o/r): predict for each word if it was replaced (r ) or not (o, original)
  • seq2seq LM: given a prefix of words in a sequence, predict the rest of the sequence
  • Span Mask: predict masked words, where the masked words are contiguous (a span)
  • Text Infilling: Spans of words are replaced with a single mask token. Must predict all the words in the masked span.
  • Sent shuffling: Unshuffle a shuffled sentence
  • TLM: (Translation Language Modeling) Tokens in both source and target sequences are masked for learning cross-lingual association.

Properties of Pretrained Models

Pretraining Methodology

Amount, Selection and Cleaning of Pretraining Data

Pretraining On An Academic Budget

Papers or projects where people have pretrained LLMs with academic compute budgets.

Software

  • GPT Neo An open-source implementation of model & data parallel GPT3-like models using the mesh-tensorflow library.
  • Huggingface Transformers library has a large number of pre-trained models. You can see a list in the github repo here

Related Pages

nlp/pretraining.txt · Last modified: 2026/02/20 06:35 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki