Differences

This shows you the differences between two versions of the page.

--- nlp:pretraining [2025/04/19 03:47] – [Amount, Selection and Cleaning of Pretraining Data] jmflanig
+++ nlp:pretraining [2026/02/20 06:35] (current) – [Key and Early Papers] jmflanig
@@ Line 10: / Line 10: @@
 For a history, see section 2.4 of [[https://arxiv.org/pdf/2003.08271.pdf|Qiu 2020]] or the related work in the [[https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf|GPT-2 paper]].
   * [[https://arxiv.org/pdf/1103.0398|Collobert et al 2011 - Natural Language Processing (almost) from Scratch]]
+  * [[https://arxiv.org/pdf/1506.06726|Kiros et al 2015 - Skip-Thought Vectors]]
   * [[https://arxiv.org/pdf/1511.01432.pdf|Dai et al 2015 - Semi-supervised Sequence Learning]]
+  * [[https://arxiv.org/pdf/1705.00108|Peters et al 2017 - Semi-supervised Sequence Tagging with Bidirectional Language Models]]
   * [[https://arxiv.org/pdf/1611.02683.pdf|Ramachandran et al 2017 - Unsupervised Pretraining for Sequence to Sequence Learning]]
   * [[https://arxiv.org/pdf/1802.05365.pdf|Peters et al 2018 - Deep Contextualized Word Representations]]
@@ Line 23: / Line 25: @@
 Papers sorted chronologically.  For a large list of pre-trained models, see [[https://github.com/huggingface/transformers/tree/master/src/transformers/models|here]].
   * CoVe: [[https://arxiv.org/pdf/1708.00107.pdf|McCann et al 2017 - Learned in Translation: Contextualized Word Vectors]]
+  * ULMFiT: [[https://arxiv.org/pdf/1801.06146|Howard & Ruder 2018 - Universal Language Model Fine-tuning for Text Classification]]
   * ELMO: [[https://arxiv.org/pdf/1802.05365.pdf|Peters et al 2018 - Deep Contextualized Word Representations]]
   * GPT: [[https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf|Radford et al 2018 - Improving Language Understanding by Generative Pre-Training]]
@@ Line 113: / Line 116: @@
     * [[https://arxiv.org/pdf/2303.08774|OpenAI 2023 - GPT-4 Technical Report]]
     * [[https://arxiv.org/pdf/2305.10403.pdf|Google 2023 - PaLM 2 Technical Report]] Talks about scaling laws, etc
+    * [[https://arxiv.org/pdf/2309.16609|Bai et al 2023 - Qwen Technical Report]] Good information
     * [[https://arxiv.org/pdf/2401.02385|Zhang et al 2024 - TinyLlama: An Open-Source Small Language Model]]
     * [[https://arxiv.org/pdf/2401.12246.pdf|2024 - Orion-14B: Open-source Multilingual Large Language Models]]