User Tools

Site Tools


nlp:pretraining

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:pretraining [2025/04/19 03:47] – [Amount, Selection and Cleaning of Pretraining Data] jmflanignlp:pretraining [2026/02/20 06:35] (current) – [Key and Early Papers] jmflanig
Line 10: Line 10:
 For a history, see section 2.4 of [[https://arxiv.org/pdf/2003.08271.pdf|Qiu 2020]] or the related work in the [[https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf|GPT-2 paper]]. For a history, see section 2.4 of [[https://arxiv.org/pdf/2003.08271.pdf|Qiu 2020]] or the related work in the [[https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf|GPT-2 paper]].
   * [[https://arxiv.org/pdf/1103.0398|Collobert et al 2011 - Natural Language Processing (almost) from Scratch]]   * [[https://arxiv.org/pdf/1103.0398|Collobert et al 2011 - Natural Language Processing (almost) from Scratch]]
 +  * [[https://arxiv.org/pdf/1506.06726|Kiros et al 2015 - Skip-Thought Vectors]]
   * [[https://arxiv.org/pdf/1511.01432.pdf|Dai et al 2015 - Semi-supervised Sequence Learning]]   * [[https://arxiv.org/pdf/1511.01432.pdf|Dai et al 2015 - Semi-supervised Sequence Learning]]
 +  * [[https://arxiv.org/pdf/1705.00108|Peters et al 2017 - Semi-supervised Sequence Tagging with Bidirectional Language Models]]
   * [[https://arxiv.org/pdf/1611.02683.pdf|Ramachandran et al 2017 - Unsupervised Pretraining for Sequence to Sequence Learning]]   * [[https://arxiv.org/pdf/1611.02683.pdf|Ramachandran et al 2017 - Unsupervised Pretraining for Sequence to Sequence Learning]]
   * [[https://arxiv.org/pdf/1802.05365.pdf|Peters et al 2018 - Deep Contextualized Word Representations]]   * [[https://arxiv.org/pdf/1802.05365.pdf|Peters et al 2018 - Deep Contextualized Word Representations]]
Line 23: Line 25:
 Papers sorted chronologically.  For a large list of pre-trained models, see [[https://github.com/huggingface/transformers/tree/master/src/transformers/models|here]]. Papers sorted chronologically.  For a large list of pre-trained models, see [[https://github.com/huggingface/transformers/tree/master/src/transformers/models|here]].
   * CoVe: [[https://arxiv.org/pdf/1708.00107.pdf|McCann et al 2017 - Learned in Translation: Contextualized Word Vectors]]   * CoVe: [[https://arxiv.org/pdf/1708.00107.pdf|McCann et al 2017 - Learned in Translation: Contextualized Word Vectors]]
 +  * ULMFiT: [[https://arxiv.org/pdf/1801.06146|Howard & Ruder 2018 - Universal Language Model Fine-tuning for Text Classification]]
   * ELMO: [[https://arxiv.org/pdf/1802.05365.pdf|Peters et al 2018 - Deep Contextualized Word Representations]]   * ELMO: [[https://arxiv.org/pdf/1802.05365.pdf|Peters et al 2018 - Deep Contextualized Word Representations]]
   * GPT: [[https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf|Radford et al 2018 - Improving Language Understanding by Generative Pre-Training]]   * GPT: [[https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf|Radford et al 2018 - Improving Language Understanding by Generative Pre-Training]]
Line 113: Line 116:
     * [[https://arxiv.org/pdf/2303.08774|OpenAI 2023 - GPT-4 Technical Report]]     * [[https://arxiv.org/pdf/2303.08774|OpenAI 2023 - GPT-4 Technical Report]]
     * [[https://arxiv.org/pdf/2305.10403.pdf|Google 2023 - PaLM 2 Technical Report]] Talks about scaling laws, etc     * [[https://arxiv.org/pdf/2305.10403.pdf|Google 2023 - PaLM 2 Technical Report]] Talks about scaling laws, etc
 +    * [[https://arxiv.org/pdf/2309.16609|Bai et al 2023 - Qwen Technical Report]] Good information
     * [[https://arxiv.org/pdf/2401.02385|Zhang et al 2024 - TinyLlama: An Open-Source Small Language Model]]     * [[https://arxiv.org/pdf/2401.02385|Zhang et al 2024 - TinyLlama: An Open-Source Small Language Model]]
     * [[https://arxiv.org/pdf/2401.12246.pdf|2024 - Orion-14B: Open-source Multilingual Large Language Models]]     * [[https://arxiv.org/pdf/2401.12246.pdf|2024 - Orion-14B: Open-source Multilingual Large Language Models]]
nlp/pretraining.1745034472.txt.gz · Last modified: 2025/04/19 03:47 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki