Differences

This shows you the differences between two versions of the page.

--- nlp:bert_and_friends [2022/07/20 07:45] – [Interpretation (BERTology)] jmflanig
+++ nlp:bert_and_friends [2023/07/06 00:22] (current) – [Introductions to BERT] jmflanig
@@ Line 1: / Line 1: @@
 ====== BERT ======
-Introductions to BERT
+===== Introductions to BERT =====
   * Paper: [[https://arxiv.org/pdf/1810.04805.pdf|Devlin et al 2018 - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]]
   * Blogs
@@ Line 7: / Line 8: @@
   * Textbooks
     * [[https://web.stanford.edu/~jurafsky/slp3/11.pdf|SLP Ch 11]] (especially [[https://web.stanford.edu/~jurafsky/slp3/11.pdf#page=6|11.2]])
+  * Training from scratch
+    * [[https://aclanthology.org/2021.emnlp-main.831.pdf|Izsak et al 2021 - How to Train BERT with an Academic Budget]]
+  * Retrospective Analyssis
+    * [[https://arxiv.org/pdf/2306.02870.pdf|Nityasya et al 2023 - On “Scientific Debt” in NLP: A Case for More Rigour in Language Model Pre-Training Research]]
 ===== Extensions =====
@@ Line 19: / Line 24: @@
   * [[https://arxiv.org/pdf/1905.05950.pdf|Tenney et al 2019 - BERT Rediscovers the Classical NLP Pipeline]]
   * [[https://arxiv.org/pdf/2002.12327.pdf|Rogers et al 2020 - A Primer in BERTology: What we know about how BERT works]]
-  * [[https://arxiv.org/pdf/2203.06204.pdf|Papadimitriou et al 2022 - When classifying grammatical role, BERT doesn’t care about word order. . . except when it matters]]
+  * [[https://twitter.com/lvwerra/status/1485301457813487619?s=21|2022 - Visualization of position embeddings in BERT and GPT-2 (Twitter)]]
+  * [[https://arxiv.org/pdf/2203.06204.pdf|Papadimitriou et al 2022 - When classifying grammatical role, BERT doesn’t care about word order... except when it matters]]
 ===== Applications =====
@@ Line 32: / Line 40: @@
 ===== Other Variants =====
   * [[https://arxiv.org/pdf/1909.05840.pdf|Shen et al 2019 - Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT]]
+  * [[https://arxiv.org/pdf/1910.01108.pdf|Sanh et al 2019 - DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter]]
 ===== Related Pages =====