====== Question Answering ======

===== Overviews =====
Best overview: [[https://arxiv.org/ftp/arxiv/papers/2001/2001.01582.pdf|Baradaran et al 2020 - A Survey on Machine Reading Comprehension Systems]].
  * [[https://arxiv.org/pdf/1809.08267.pdf|Gao et al 2018 - Neural Approaches to Conversational AI]] (contains a chapter on QA)
  * **[[https://arxiv.org/ftp/arxiv/papers/2001/2001.01582.pdf|Baradaran et al 2020 - A Survey on Machine Reading Comprehension Systems]]**
  * [[https://arxiv.org/pdf/2010.00389.pdf|Thayaparan et al 2020 - A Survey on Explainability in Machine Reading Comprehension]]
  * **[[https://arxiv.org/pdf/2107.12708.pdf|Rogers et al 2022 - QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension]]** [[https://dl.acm.org/doi/pdf/10.1145/3560260|ACM version (better)]]
===== Demos =====
  * [[https://demo.allennlp.org/reading-comprehension/transformer-qa|AllenNLP - RoBERTa QA Model Online Demo]]

===== Key Papers =====
  * Early papers
    * [[https://aclanthology.org/W00-0603.pdf|Riloff & Thelen 2000 - A Rule-based Question Answering System for Reading Comprehension Tests]] (Cited by the SQuAD 1.0 paper)
  * [[https://arxiv.org/pdf/1606.02858v2.pdf|Chen et al 2016 - A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task]]
  * [[https://arxiv.org/pdf/1606.05250.pdf|Rajpurkar et al 2016 - SQuAD: 100,000+ Questions for Machine Comprehension of Text]]
  * BiDAF model
  * [[https://arxiv.org/pdf/1806.03822.pdf|Rajpurkar et al 2018 - Know What You Don't Know: Unanswerable Questions for SQuAD]] (SQuAD 2.0 paper)
  * [[https://aclanthology.org/2020.emnlp-main.550.pdf|Karpukhin et al 2020 - Dense Passage Retrieval for Open-Domain Question Answering]]
  * [[https://arxiv.org/pdf/2404.06283|Basmov et al 2024 - LLMs’ Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements]]

====== Topics ======

===== General QA Papers =====
  * [[https://arxiv.org/pdf/1601.01705.pdf|Andreas et al 2016 - Learning to Compose Neural Networks for Question Answering]]
  * [[https://arxiv.org/pdf/2404.06283|Basmov et al 2024 - LLMs’ Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements]]

===== Explanation And Implicit Reasoning Papers =====
  * [[https://arxiv.org/pdf/2004.14648.pdf|Chen & Durrett 2020 - Robust Question Answering Through Sub-part Alignment]]
  * [[https://arxiv.org/pdf/2009.06354.pdf|Lamm et al 2020 - QED: A Framework and Dataset for Explanations in Question Answering]]
  * [[https://arxiv.org/pdf/2010.00389.pdf|Thayaparan et al 2020 - A Survey on Explainability in Machine Reading Comprehension]]
  * [[https://arxiv.org/pdf/2101.02235.pdf|Geva et al 2021 - Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies]]
  * [[https://arxiv.org/pdf/2104.08661.pdf|Dalvi et al 2021 - Explaining Answers with Entailment Trees]]

===== QA with Attribution =====
  * [[https://arxiv.org/pdf/2212.08037|Bohnet et al 2022 - Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models]]

===== Robust Question Answering =====
  * [[https://arxiv.org/pdf/2004.14648.pdf|Chen & Durrett 2020 - Robust Question Answering Through Sub-part Alignment]]

===== Open-Domain Question Answering =====
  * [[https://arxiv.org/pdf/1906.00300.pdf|Lee et al 2019 - Latent Retrieval for Weakly Supervised Open Domain Question Answering]]
  * **DPR model: [[https://aclanthology.org/2020.emnlp-main.550.pdf|Karpukhin et al 2020 - Dense Passage Retrieval for Open-Domain Question Answering]]**
  * [[https://arxiv.org/pdf/2012.12624.pdf|Lee et al 2021 - Learning Dense Representations of Phrases at Scale]]

===== Multi-hop Reasoning =====
  * [[https://arxiv.org/pdf/1809.09600.pdf|Yang et al 2018 - HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering]]
  * [[https://arxiv.org/pdf/1903.00161.pdf|Dua et al 2019 - DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs]]
  * [[https://arxiv.org/pdf/1906.02916.pdf|Min et al 2019 - Multi-hop Reading Comprehension through Question Decomposition and Rescoring]] Decomposes questions into simpler questions, answers them, and then rescores the answer
  * [[https://arxiv.org/pdf/2012.00893.pdf|Sun et al 2020 - PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text]]

===== Multi-Span QA =====
Span-based QA datasets like SQuAD require that the answer span is contiguous.  Multi-Span QA relaxes this restriction so that questions like "What are the type of Turing Machines?" can be answered with multiple spans from the context passage.
  * [[https://arxiv.org/pdf/1909.13375.pdf|Segal et al 2019 - A Simple and Effective Model for Answering Multi-span Questions]]

===== Yes/No Questions =====
  * [[https://arxiv.org/pdf/1905.10044.pdf|Clark et al 2019 - BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions]]
  * [[https://aclanthology.org/2022.naacl-main.79.pdf|Sulem et al 2021 - Yes, No or IDK: The Challenge of Unanswerable Yes/No Questions]]

===== Long-Form QA =====
  * [[https://aclanthology.org/P19-1346.pdf|Fan et al 2019 - ELI5: Long Form Question Answering]]

===== Knowledge-Grounded QA =====
See [[nlp:knowledge-enhanced_methods#Knowledge-Grounded Question Answering]].

===== Commonsense QA ====
See [[nlp:commonsense_reasoning#Commonsense Question Answering]].

===== Selective QA =====
  * [[https://www.aclweb.org/anthology/2020.findings-emnlp.370.pdf|2020- No Answer is Better Than Wrong Answer: A Reflection Model for Document Level Machine Reading Comprehension]] Top leaderboard on Natural Questions
  * [[https://www.aclweb.org/anthology/2020.acl-main.503.pdf|2020 - Selective Question Answering under Domain Shift]]
  * [[https://arxiv.org/pdf/1904.04792.pdf|2019 - Quizbowl: The case for incremental question answering.]]

===== Domain Shift =====
  * [[https://www.aclweb.org/anthology/2020.acl-main.503.pdf|Selective Question Answering under Domain Shift]]

===== Domain Adaptation =====
See the related work in [[https://arxiv.org/pdf/2203.08926.pdf|Yue 2022]] and [[https://arxiv.org/pdf/2210.10861.pdf|Yue 2022]], and also [[https://scholar.google.com/citations?user=VKKAfwMAAAAJ&hl=en|Arafat Sultan's publications on QA]].

=== Synthetic Question Generation ===
See also [[Question Generation]].

  * [[https://aclanthology.org/D17-1090.pdf|Duan et al 2017 - Question Generation for Question Answering]]
  * [[https://arxiv.org/pdf/1706.02027.pdf|Tang et al 2017 - Question Answering and Question Generation as Dual Tasks]]
  * [[https://aclanthology.org/2020.emnlp-main.439.pdf|Shakeri et al 2020 - End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems]] Generates QA pairs in the target domain using encoder-decoder large pre-trained model fine-tuned to the dataset.
  * [[https://arxiv.org/pdf/2012.01414.pdf|Reddy et al 2020 - End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training]]
  * [[https://arxiv.org/pdf/2010.06028.pdf|Shakeri et al 2020 - End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems]]
  * **[[https://arxiv.org/pdf/2010.12776.pdf|Chen et al 2020 - Improved Synthetic Training for Reading Comprehension]]**
  * [[https://arxiv.org/pdf/2010.16021.pdf|2020 - CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering]]
  * [[https://arxiv.org/pdf/2109.07954.pdf|Lyu et al 2021 - Improving Unsupervised Question Answering via Summarization-Informed Question Generation]]
  * [[https://arxiv.org/pdf/2203.08926.pdf|Yue et al 2022 - Synthetic Question Value Estimation
for Domain Adaptation of Question Answering]]
  * [[https://arxiv.org/pdf/2204.09248.pdf|Reddy et al 2022 - Synthetic Target Domain Supervision for Open Retrieval QA]]
  * [[https://arxiv.org/pdf/2210.10861.pdf|Yue et al 2022 - QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation]]

===== Cross-Lingual =====
  * [[https://aclanthology.org/2022.mia-1.9.pdf|Agarwal et al 2022 - Zero-shot Cross-lingual Open Domain Question Answering]]

===== Unsupervised QA =====
  * [[https://arxiv.org/pdf/2109.07954.pdf|Lyu et al 2021 - Improving Unsupervised Question Answering via Summarization-Informed Question Generation]]

===== Evaluation =====
  * [[https://aclanthology.org/2021.acl-long.346.pdf|Rodriguez et al 2021 - Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?]]

====== Datasets ======
See [[https://arxiv.org/pdf/2107.12708.pdf|Rogers et al 2022 - QA Dataset Explosion]]. See also [[http://nlpprogress.com/english/question_answering.html|NLP Progress - Question Answering]] and [[https://docs.google.com/spreadsheets/d/1gWDy7-rfT0efhmFF42fR9cPpfqZDKl07q1JxrRqGEVI/edit#gid=0|Geetanjali's QA Datasets Spreadsheet]]
  * **CNN/Daily Mail Reading Comprehension**
    * Large-scale cloze-style QA dataset constructed from news articles and their summaries
    * Paper: [[https://arxiv.org/pdf/1506.03340.pdf|Hermann 2015 - Teaching Machines to Read and Comprehend]]
    * [[https://arxiv.org/pdf/1606.02858v2.pdf|Chen et al 2016 - A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task]]
    * See [[https://github.com/danqi/rc-cnn-dailymail]]
  * **SQuAD**: [[https://arxiv.org/pdf/1606.05250.pdf|Rajpurkar et al 2016 - SQuAD: 100,000+ Questions for Machine Comprehension of Text]] SQuAD 2.0: [[https://arxiv.org/pdf/1806.03822.pdf|Rajpurkar et al 2018 - Know What You Don’t Know: Unanswerable Questions for SQuAD]]
  * **NewsQA**: [[https://arxiv.org/pdf/1611.09830.pdf|Trischler et al 2016 - NewsQA: A Machine Comprehension Dataset]]. 
  * **NarrativeQA**: [[https://arxiv.org/pdf/1712.07040.pdf|Kočiský et al 2017 - The NarrativeQA Reading Comprehension Challenge]]
  * **AI2 Reasoning Challenge (ARC)**: [[https://arxiv.org/pdf/1803.05457.pdf|Clark et al 2018 - Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge]]
      * [[https://arxiv.org/pdf/1806.00358.pdf|Boratko et al 2019 - A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset]]
  * **MathQA**: [[https://arxiv.org/pdf/1905.13319.pdf|Amini et al 2019 - MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms]]
  * **Math Questions**
    * [[https://www.aclweb.org/anthology/S19-2153.pdf|Hopkins et al 2019 - SemEval 2019 Task 10: Math Question Answering]]
  * [[https://arxiv.org/pdf/2102.01065.pdf|Liu et al 2021 - Can Small and Synthetic Benchmarks Drive Modeling Innovation? A Retrospective Study of Question Answering Modeling Approaches]]
  * **COPA: Choice of Plausible Alternatives** (Gordon et al., 2012): Asking about either a plausible cause or a plausible result, among two alternatives, of a certain event expressed in a simple sentence. (Summary from [[https://arxiv.org/pdf/2004.05483.pdf|Shwartz 2020]])
  * **CommonSenseQA: commonsense Question Answering** (Talmor et al., 2019): General questions about concepts from ConceptNet. To increase the challenge, the distractors are related to the target concept either by a relationship in ConceptNet or as suggested by crowdsourcing workers. (Summary from [[https://arxiv.org/pdf/2004.05483.pdf|Shwartz 2020]])
  * **MC-TACO: Multiple Choice Temporal commonsense** (Zhou et al., 2019): Questions about temporal aspects of events such as ordering, duration, frequency, and typical time. The distractors were selected in an adversarial way using BERT. (Summary from [[https://arxiv.org/pdf/2004.05483.pdf|Shwartz 2020]])
  * **Social IQa: Social Interaction Question Answering** (Sap et al., 2019b): Questions regarding social interactions, based on the ATOMIC dataset (Sap et al., 2019a). Contexts describe social interactions and questions refer to one of a few aspects (e.g. the subject’s motivation, following actions, etc.). The answers were crowdsourced. (Summary from [[https://arxiv.org/pdf/2004.05483.pdf|Shwartz 2020]])
  * **PIQA: Physical Interaction Question Answering** (Bisk et al., 2020): Questions regarding physical commonsense knowledge. Contexts are goals derived from an instruction website, typically involving less prototypical uses of everyday objects (e.g., using a bottle to separate eggs). The answers were crowdsourced, and an adversarial filtering algorithm was used to remove annotation artifacts. (Summary from [[https://arxiv.org/pdf/2004.05483.pdf|Shwartz 2020]])
  * **WinoGrande** (Sakaguchi et al., 2020): A large scale version of WSC that exhibits less bias thanks to adversarial filtering and use of placeholders instead of pronouns. As opposed to WSC that was curated by experts, WinoGrande was crowdsourced with a carefully designed approach that produces diverse examples which are trivial for humans. (Summary from [[https://arxiv.org/pdf/2004.05483.pdf|Shwartz 2020]])
  * **HybridQA**: [[https://arxiv.org/pdf/2004.07347.pdf|Chen et al 2020 - HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data]]
  * **UnifiedQA**: [[https://arxiv.org/pdf/2005.00700.pdf|Khashabi et al 2020 - UnifiedQA: Crossing Format Boundaries With a Single QA System]]
  * Open Domain
    *  **Natural Questions**
      * Stats: ~300k QA pairs
      * [[https://ai.google.com/research/NaturalQuestions/visualization|Dataset viewer]]
      * [[https://github.com/google-research/language/tree/master/language/question_answering/bert_joint|Bert Baseline]] Paper: [[https://arxiv.org/pdf/1901.08634.pdf|Alberti et al 2019 - A BERT Baseline for the Natural Questions]]
      * System Papers
        * [[https://arxiv.org/pdf/2004.14560.pdf|2020 - RikiNet: Reading Wikipedia Pages for Natural Question Answering]]
        * [[https://arxiv.org/pdf/2105.04241.pdf|Zemlyanskiy et al 2021 - ReadTwice: Reading Very Large Documents with Memories]]
    * **MS MARCO** [[https://arxiv.org/pdf/1611.09268.pdf|Bajaj et al 2016 - MS MARCO: A Human Generated MAchine Reading COmprehension Dataset]] List of QA datasets from MS MARCO paper: {{media:qa_datasets.png}}
    * **QReCC** [[https://arxiv.org/pdf/2010.04898.pdf|Anantha et al 2020 - Open-Domain Question Answering Goes Conversational via Question Rewriting]]
    * **AmbigQA:** [[https://arxiv.org/pdf/2004.10645.pdf|Min et al 2021 - AmbigQA: Answering Ambiguous Open-domain Questions]]
  * **QuAC**
    * System Papers
      * [[https://dl.acm.org/doi/pdf/10.1145/3331184.3331341|Qu et al 2019 - BERT with History Answer Embedding for Conversational Question Answering]]
  * Biomedical Domain
    * **BioASQ**
    * **[[https://github.com/deepset-ai/COVID-QA|CovidQA]]** [[https://aclanthology.org/2020.nlpcovid19-acl.18.pdf|Paper 1]] [[https://arxiv.org/pdf/2004.11339v1.pdf|Paper 2 (older)]]
  * Product Domain (Product-related Question Answering - PQA)
    * **Amazon-PQA** and **AmazonPQSim**: [[https://aclanthology.org/2021.naacl-main.23.pdf|Rozen et al 2021 - Answering Product-Questions by Utilizing Questions from Other Contextually Similar Products]] Worked with Yes/No questions only.  Datasets available [[https://registry.opendata.aws/|here]] (search for [[https://registry.opendata.aws/amazon-pqa/|Amazon-PQA]] or AmazonPQSim). Dataset incluses free-form, not extractive answers and yes/no questions.
  * Research Paper Domain
    * **Qasper**: [[https://arxiv.org/pdf/2105.03011.pdf|Dasigi et al 2021 - A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers]] This dataset is really hard. Contains everything: abstractive, multispan extractive, yes / no, unanswerable questions, all from a very long context.

====== Resources ======

===== Slides =====
  * [[https://docs.google.com/presentation/d/1IO8W6Ion_f5oykBPCnHLm8PcowP2kSiXKS_KaInOER0/edit#slide=id.p|Geetanjali's slides]] and [[https://docs.google.com/document/d/1NU9rPFoQqZX_xGG8UHEPo3btde0XzwLSBmc3HRHNrOw/edit#heading=h.vardnhz443vd|here]]

===== People  =====
  * [[https://scholar.google.com/citations?user=xCYHonIAAAAJ&hl=en|Jonathan Berant]]
  * [[https://scholar.google.com/citations?user=8ys-38kAAAAJ&hl=en|William Cohen]]
  * [[https://scholar.google.com/citations?user=SfKdzrUAAAAJ&hl=en|Matt Gardner]]
  * [[https://scholar.google.com/citations?user=ezllEwMAAAAJ&hl=en|Edward Grefenstette]]
  * [[https://scholar.google.com/citations?user=pouyVyUAAAAJ&hl=en|Percy Liang]]
  * [[https://scholar.google.com/citations?user=1zmDOdwAAAAJ&hl=en|Chris Manning]]
  * [[https://scholar.google.com/citations?user=yILa1y0AAAAJ&hl=en|Andrew McCallum]]
  * [[https://scholar.google.com/citations?user=FABZCeAAAAAJ&hl=en|Hwee Tou Ng]]
  * [[https://scholar.google.com/citations?user=9Yd716IAAAAJ&hl=en|Chitta Baral]]

===== Related Pages =====
  * [[Question Generation]]
  * [[Visual Question Answering]]