====== Key Papers in NLP ======
//Work in progress.// Aims to be a compact list of the key papers in NLP.

===== Overviews =====
  * Machine Translation
  * Dialog
  * Question Answering
  * Information Extraction
  * Deep Learning
    * Ruder's 2016 paper

===== Historical Background =====
  * Machine Translation
    * [[https://www.aclweb.org/anthology/J93-2003.pdf|Brown et al 1993 - The Mathematics of Statistical Machine Translation: Parameter Estimation]]
  * Parsing
    * Early work on Penn Treebank
    * ParsEval


===== Papers =====
Each paper is listed twice: sorted by broad areas or by topic.

=== By Broad Area ===

  * Methods
    * Attention: [[https://arxiv.org/pdf/1409.0473.pdf|Bahdanau et al 2014 - Neural Machine Translation by Jointly Learning to Align and Translate]]
    * Seq2seq: [[https://arxiv.org/pdf/1409.3215.pdf|Sutskever et al 2014 - Sequence to Sequence Learning with Neural Networks]]
    * BPE: [[https://arxiv.org/pdf/1508.07909.pdf|Sennrich et al 2016 - Neural Machine Translation of Rare Words with Subword Units]]
    * Transformer: [[https://arxiv.org/pdf/1706.03762.pdf|Vaswani et al 2017 - Attention Is All You Need]]
    * CRFs: [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers|Lafferty et al 2001 - Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data]]
    * Decoding
    * Neural features: [[https://arxiv.org/pdf/1603.01360.pdf|Lample et al 2016 - Neural Architectures for Named Entity Recognition]] [[https://www.aclweb.org/anthology/Q16-1023.pdf|Kiperwasser & Goldberg 2016 - Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations]]
    * BERT
    * GPT-2, GPT-3
  * Datasets
    * QA: Squad v1, v2
    * NLI: SNLI
    * Dialog: 
      * MultiWOZ:[[https://arxiv.org/pdf/1810.00278.pdf|Budzianowski et al 2018 - MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling]]
    * Information Extraction
      * Named Entity Recognition: [[https://aclanthology.org/W03-0419.pdf|Tjong et al 2003 - Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition]]
    * General Benchmarks
      * GLUE, SuperGLUE
      * MMMLU: [[https://arxiv.org/pdf/2009.03300.pdf|Hendrycks et al 2020 - Measuring Massive Multitask Language Understanding]]
  * Evaluation and Ethics
    * BLEU: [[https://aclanthology.org/P02-1040.pdf|Papineni et al 2002 - BLEU: a Method for Automatic Evaluation of Machine Translation]]
    * Annotation artifacts
    * 2016 ethics paper
  * Deep Learning
    * Dropout
    * Batch and Layer Norm
    * Adam

=== By Topic ===

  * Machine Translation
    * Attention: [[https://arxiv.org/pdf/1409.0473.pdf|Bahdanau et al 2014 - Neural Machine Translation by Jointly Learning to Align and Translate]]
    * BPE: [[https://arxiv.org/pdf/1508.07909.pdf|Sennrich et al 2016 - Neural Machine Translation of Rare Words with Subword Units]]
    * Transformer: [[https://arxiv.org/pdf/1706.03762.pdf|Vaswani et al 2017 - Attention Is All You Need]]
  * Dialog
      * MultiWOZ:[[https://arxiv.org/pdf/1810.00278.pdf|Budzianowski et al 2018 - MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling]]
  * Question Answering (QA)
    * Squad v1, v2
  * Natural Language Inference (NLI)
    * SNLI
  * Vision and Language
  * Information Extraction (IE)
    * Named Entity Recognition (NER)
      * [[https://aclanthology.org/W03-0419.pdf|Tjong et al 2003 - Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition]]
      * [[https://arxiv.org/pdf/1603.01360.pdf|Lample et al 2016 - Neural Architectures for Named Entity Recognition]]
  * Methods
    * Seq2seq: [[https://arxiv.org/pdf/1409.3215.pdf|Sutskever et al 2014 - Sequence to Sequence Learning with Neural Networks]]
    * CRFs: [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers|Lafferty et al 2001 - Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data]]
    * Decoding
  * PreTraining, Language Models, and In-Context Learning
    * BERT
    * GPT-2, GPT-3
  * Syntactic Parsing
    * Dependency parsing
      * [[https://www.aclweb.org/anthology/Q16-1023.pdf|Kiperwasser & Goldberg 2016 - Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations]]
  * Semantic Parsing
  * Evaluation and Ethics
    * BLEU: [[https://aclanthology.org/P02-1040.pdf|Papineni et al 2002 - BLEU: a Method for Automatic Evaluation of Machine Translation]]
    * Annotation artifacts
    * 2016 ethics paper
  * Deep Learning
    * Dropout
    * Batch and Layer Norm
    * Adam

===== Related Pages =====
  * [[History of NLP]]