nlp:question_answering
Table of Contents
Question Answering
Overviews
Best overview: Baradaran et al 2020 - A Survey on Machine Reading Comprehension Systems.
- Gao et al 2018 - Neural Approaches to Conversational AI (contains a chapter on QA)
Demos
Key Papers
- Early papers
- Riloff & Thelen 2000 - A Rule-based Question Answering System for Reading Comprehension Tests (Cited by the SQuAD 1.0 paper)
- BiDAF model
Topics
General QA Papers
Explanation And Implicit Reasoning Papers
QA with Attribution
Robust Question Answering
Open-Domain Question Answering
Multi-hop Reasoning
- Min et al 2019 - Multi-hop Reading Comprehension through Question Decomposition and Rescoring Decomposes questions into simpler questions, answers them, and then rescores the answer
Multi-Span QA
Span-based QA datasets like SQuAD require that the answer span is contiguous. Multi-Span QA relaxes this restriction so that questions like “What are the type of Turing Machines?” can be answered with multiple spans from the context passage.
Yes/No Questions
Long-Form QA
Knowledge-Grounded QA
Commonsense QA
Selective QA
- 2020- No Answer is Better Than Wrong Answer: A Reflection Model for Document Level Machine Reading Comprehension Top leaderboard on Natural Questions
Domain Shift
Domain Adaptation
See the related work in Yue 2022 and Yue 2022, and also Arafat Sultan's publications on QA.
Synthetic Question Generation
See also Question Generation.
- Shakeri et al 2020 - End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems Generates QA pairs in the target domain using encoder-decoder large pre-trained model fine-tuned to the dataset.
Cross-Lingual
Unsupervised QA
Evaluation
Datasets
See Rogers et al 2022 - QA Dataset Explosion. See also NLP Progress - Question Answering and Geetanjali's QA Datasets Spreadsheet
- CNN/Daily Mail Reading Comprehension
- Large-scale cloze-style QA dataset constructed from news articles and their summaries
- AI2 Reasoning Challenge (ARC): Clark et al 2018 - Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
- Math Questions
- COPA: Choice of Plausible Alternatives (Gordon et al., 2012): Asking about either a plausible cause or a plausible result, among two alternatives, of a certain event expressed in a simple sentence. (Summary from Shwartz 2020)
- CommonSenseQA: commonsense Question Answering (Talmor et al., 2019): General questions about concepts from ConceptNet. To increase the challenge, the distractors are related to the target concept either by a relationship in ConceptNet or as suggested by crowdsourcing workers. (Summary from Shwartz 2020)
- MC-TACO: Multiple Choice Temporal commonsense (Zhou et al., 2019): Questions about temporal aspects of events such as ordering, duration, frequency, and typical time. The distractors were selected in an adversarial way using BERT. (Summary from Shwartz 2020)
- Social IQa: Social Interaction Question Answering (Sap et al., 2019b): Questions regarding social interactions, based on the ATOMIC dataset (Sap et al., 2019a). Contexts describe social interactions and questions refer to one of a few aspects (e.g. the subject’s motivation, following actions, etc.). The answers were crowdsourced. (Summary from Shwartz 2020)
- PIQA: Physical Interaction Question Answering (Bisk et al., 2020): Questions regarding physical commonsense knowledge. Contexts are goals derived from an instruction website, typically involving less prototypical uses of everyday objects (e.g., using a bottle to separate eggs). The answers were crowdsourced, and an adversarial filtering algorithm was used to remove annotation artifacts. (Summary from Shwartz 2020)
- WinoGrande (Sakaguchi et al., 2020): A large scale version of WSC that exhibits less bias thanks to adversarial filtering and use of placeholders instead of pronouns. As opposed to WSC that was curated by experts, WinoGrande was crowdsourced with a carefully designed approach that produces diverse examples which are trivial for humans. (Summary from Shwartz 2020)
- Open Domain
- Natural Questions
- Stats: ~300k QA pairs
- System Papers
- MS MARCO Bajaj et al 2016 - MS MARCO: A Human Generated MAchine Reading COmprehension Dataset List of QA datasets from MS MARCO paper:

- QuAC
- System Papers
- Biomedical Domain
- BioASQ
- Product Domain (Product-related Question Answering - PQA)
- Amazon-PQA and AmazonPQSim: Rozen et al 2021 - Answering Product-Questions by Utilizing Questions from Other Contextually Similar Products Worked with Yes/No questions only. Datasets available here (search for Amazon-PQA or AmazonPQSim). Dataset incluses free-form, not extractive answers and yes/no questions.
- Research Paper Domain
- Qasper: Dasigi et al 2021 - A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers This dataset is really hard. Contains everything: abstractive, multispan extractive, yes / no, unanswerable questions, all from a very long context.
Resources
Slides
People
Related Pages
nlp/question_answering.txt · Last modified: 2025/05/13 19:46 by jmflanig