User Tools

Site Tools


nlp:bert_and_friends

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:bert_and_friends [2021/12/14 08:45] – [Extensions] jmflanignlp:bert_and_friends [2023/07/06 00:22] (current) – [Introductions to BERT] jmflanig
Line 1: Line 1:
 ====== BERT ====== ====== BERT ======
-Introductions to BERT+ 
 +===== Introductions to BERT =====
   * Paper: [[https://arxiv.org/pdf/1810.04805.pdf|Devlin et al 2018 - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]]   * Paper: [[https://arxiv.org/pdf/1810.04805.pdf|Devlin et al 2018 - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]]
   * Blogs   * Blogs
     * [[https://www.analyticsvidhya.com/blog/2019/09/demystifying-bert-groundbreaking-nlp-framework/| Demystifying BERT: A Comprehensive Guide to the Groundbreaking NLP Framework]]     * [[https://www.analyticsvidhya.com/blog/2019/09/demystifying-bert-groundbreaking-nlp-framework/| Demystifying BERT: A Comprehensive Guide to the Groundbreaking NLP Framework]]
     * [[http://jalammar.github.io/illustrated-bert/|The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)]]     * [[http://jalammar.github.io/illustrated-bert/|The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)]]
 +  * Textbooks
 +    * [[https://web.stanford.edu/~jurafsky/slp3/11.pdf|SLP Ch 11]] (especially [[https://web.stanford.edu/~jurafsky/slp3/11.pdf#page=6|11.2]])
 +  * Training from scratch
 +    * [[https://aclanthology.org/2021.emnlp-main.831.pdf|Izsak et al 2021 - How to Train BERT with an Academic Budget]]
 +  * Retrospective Analyssis
 +    * [[https://arxiv.org/pdf/2306.02870.pdf|Nityasya et al 2023 - On “Scientific Debt” in NLP: A Case for More Rigour in Language Model Pre-Training Research]]
  
 ===== Extensions ===== ===== Extensions =====
Line 10: Line 17:
   * [[https://arxiv.org/pdf/2106.02736.pdf|Goyal et al 2021 - Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis–Hastings]]   * [[https://arxiv.org/pdf/2106.02736.pdf|Goyal et al 2021 - Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis–Hastings]]
  
-===== Interpretation (BERTology) =====+===== Interpretation and Properties (BERTology) =====
 Summary: [[https://arxiv.org/pdf/2002.12327.pdf|Rogers et al 2020 - A Primer in BERTology: What we know about how BERT works]].  See also [[ml:Neural Network Psychology]]. Summary: [[https://arxiv.org/pdf/2002.12327.pdf|Rogers et al 2020 - A Primer in BERTology: What we know about how BERT works]].  See also [[ml:Neural Network Psychology]].
 +  * [[https://arxiv.org/pdf/1906.04341.pdf|2019 - What Does BERT Look At?An Analysis of BERT’s Attention]] Also points out that BERT looks at the SEP token as a no-op attention.
   * [[https://arxiv.org/pdf/1909.10430.pdf|Wiedemann et al 2019 - Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings]]   * [[https://arxiv.org/pdf/1909.10430.pdf|Wiedemann et al 2019 - Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings]]
   * [[https://arxiv.org/pdf/1905.06316.pdf|Tenney et al 2019 - What do you learn from context? Probing for sentence structure in contextualized word representations]]   * [[https://arxiv.org/pdf/1905.06316.pdf|Tenney et al 2019 - What do you learn from context? Probing for sentence structure in contextualized word representations]]
   * [[https://arxiv.org/pdf/1905.05950.pdf|Tenney et al 2019 - BERT Rediscovers the Classical NLP Pipeline]]   * [[https://arxiv.org/pdf/1905.05950.pdf|Tenney et al 2019 - BERT Rediscovers the Classical NLP Pipeline]]
   * [[https://arxiv.org/pdf/2002.12327.pdf|Rogers et al 2020 - A Primer in BERTology: What we know about how BERT works]]   * [[https://arxiv.org/pdf/2002.12327.pdf|Rogers et al 2020 - A Primer in BERTology: What we know about how BERT works]]
 +  * [[https://twitter.com/lvwerra/status/1485301457813487619?s=21|2022 - Visualization of position embeddings in BERT and GPT-2 (Twitter)]]
 +  * [[https://arxiv.org/pdf/2203.06204.pdf|Papadimitriou et al 2022 - When classifying grammatical role, BERT doesn’t care about word order... except when it matters]]
 +
 +
  
 ===== Applications ===== ===== Applications =====
Line 28: Line 40:
 ===== Other Variants ===== ===== Other Variants =====
   * [[https://arxiv.org/pdf/1909.05840.pdf|Shen et al 2019 - Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT]]   * [[https://arxiv.org/pdf/1909.05840.pdf|Shen et al 2019 - Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT]]
 +  * [[https://arxiv.org/pdf/1910.01108.pdf|Sanh et al 2019 - DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter]]
  
 ===== Related Pages ===== ===== Related Pages =====
nlp/bert_and_friends.1639471501.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki