====== Discourse Analysis ====== ===== Introductions and Overviews ===== * Introductions * [[https://web.stanford.edu/~jurafsky/slp3/22.pdf|S&LP - Ch 22]] * Overviews * [[https://apps.dtic.mil/sti/pdfs/ADA601542.pdf|2013 - Multiparticipant chat analysis: A survey]] * PDTB * [[https://aclanthology.org/J14-4007.pdf|Prasad et al 2014 - Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation]] ===== Discourse Parsing ===== * [[https://www.aclweb.org/anthology/P02-1047.pdf|Marcu & Echihabi 2002 - An Unsupervised Approach to Recognizing Discourse Relations]] The first discourse parser * [[https://www.aclweb.org/anthology/D14-1220.pdf|Li et al 2014 - Recursive Deep Models for Discourse Parsing]] * [[https://www.aclweb.org/anthology/D15-1109.pdf|Afantenos et al 2015 - Discourse parsing for multi-party chat dialogues]] * [[https://www.aclweb.org/anthology/K15-2001.pdf|Xue et al 2015 - The CoNLL-2015 Shared Task on Shallow Discourse Parsing]] * [[https://www.aclweb.org/anthology/N16-1013.pdf|Perret et al 2016 - Integer Linear Programming for Discourse Parsing]] Good intro to discourse parsing * [[https://www.aclweb.org/anthology/E17-1028.pdf|Braud et al 2017 - Cross-lingual RST Discourse Parsing]] * [[https://www.aclweb.org/anthology/J17-3005.pdf|Stab & Gurevych 2017 - Parsing Argumentation Structures in Persuasive Essays]] (Contains an overview of related work in discourse parsing in sec 2.5). Another version here: [[https://arxiv.org/pdf/1604.07370.pdf|here]]. * [[https://www.aclweb.org/anthology/J18-2001.pdf|Morey et al 2018 - A Dependency Perspective on RST Discourse Parsing and Evaluation]] * [[https://www.aclweb.org/anthology/P19-1410.pdf|Lin et al 2019 - A Unified Linear-Time Framework for Sentence-Level Discourse Parsing]] === For Dialog Systems === * **[[https://aclanthology.org/J93-4004.pdf|Moore & Paris 1993 - Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information]]** Old paper, but very good. Talks about how one could use RST in a dialog system planner. Contains a mapping between desired belief states in the listener and RST relations (in table 1, p. 20). * [[https://dl.acm.org/doi/pdf/10.3115/1118253.1118288|Stent 2000 - Rhetorical Structure in Dialog]] * [[https://arxiv.org/pdf/1907.03975.pdf|Ma et al 2019 - Implicit Discourse Relation Identification for Open-domain Dialogue]] * [[https://github.com/derekmma/dialogue-discourse-relation]] (annotated corpus) * [[https://github.com/jfainberg/self_dialogue_corpus]] (full corpus) ([[https://arxiv.org/pdf/1809.06641.pdf|paper]]) ==== RST ==== * Linguistics theory, see also [[https://en.wikipedia.org/wiki/Rhetorical_structure_theory|Wikipedia - Rhetorical Structure Theory]] * [[https://aclanthology.org/P84-1076.pdf|Mann 1984 - Discourse Structures for Text Generation]] Talks about RST, says "The descriptive portion of RST has been developed over the past two years by Sandra Thompson and me, with major contributions by Christian Matthiassen and Barbara Fox" (footnote 1). * [[https://www.sfu.ca/rst/pdfs/Mann_Thompson_1987.pdf|Mann & Thompson 1987 - Rhetorical Structure Theory: A Theory of Text Organization]] * [[https://www.sfu.ca/rst/05bibliographies/bibs/Mann_Thompson_1988.pdf|Mann & Thompson 1988 - Rhetorical Structure Theory: Toward a functional theory of text organization]] The paper that is usually cited for RST === RST-DT === * Dataset * Paper: [[https://aclanthology.org/W01-1605.pdf|Carlson et al 2001 - Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory]] Created the RST Discourse Tree Bank. Annotated full RST discourse structures. * LDC: [[https://catalog.ldc.upenn.edu/LDC2002T07| RST Discourse Treebank]] * Website: [[https://www.isi.edu/~marcu/discourse/Corpora.html]] * Guideline: [[https://www.lti.cs.cmu.edu/sites/default/files/Guideline%20and%20Flowchart%20for%20Rhetorical%20Structure%20Theory%20Annotation_2.pdf|RST Annotation Guideline]] and [[http://www.sfu.ca/~mtaboada/docs/research/RST_Annotation_Guidelines.pdf|another guideline]] * [[https://www.sfu.ca/rst/|RST Website]] * Annotation tool: [[https://corpling.uis.georgetown.edu/rstweb/info/|rstWeb]] Web-based annotation tool * Systems * [[https://www.aclweb.org/anthology/N16-1013.pdf|Perret et al 2016 - Integer Linear Programming for Discourse Parsing]] Good intro to discourse parsing * [[https://www.aclweb.org/anthology/D17-1136.pdf|Morey et al 2017 - How much progress have we made on RST discourse parsing? A replication study of recent results on the RST-DT]] * [[https://arxiv.org/pdf/1905.05682.pdf|Lin et al 2019 - A Unified Linear-Time Framework for Sentence-Level Discourse Parsing]] [[https://github.com/ntunlpsg/UnifiedParser_RST|github]] === Other RST Datasets === * [[https://www.irit.fr/STAC/corpus.html|STAC Corpus]] * [[http://www.sfu.ca/~mtaboada/SFU_Review_Corpus.html|SFU Review Corpus]] ==== Penn Discourse Treebank (PDTB) ==== See also [[https://www.seas.upenn.edu/~pdtb/publications.shtml|PDTB Publications]]. PDTB is a shallow discourse representation, as opposed to RST-DT. According to [[https://www.aclweb.org/anthology/N16-1013.pdf|Perret 2016]] "the PDTB does not provide full discourse structures for texts," but the RST-DT does. * Dataset * [[https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.3.9607&rep=rep1&type=pdf|Miltsakaki et al 2004 - The Penn Discourse Treebank]] * [[https://www.seas.upenn.edu/~pdtb/papers/nodalida.pdf|Webber et al 2005- A Short Introduction to the Penn Discourse TreeBank]] * [[http://www.lrec-conf.org/proceedings/lrec2008/pdf/754_paper.pdf|Prasad et al 2008 - The Penn Discourse TreeBank 2.0]] * **[[https://aclanthology.org/J14-4007.pdf|Prasad et al 2014 - Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation]]** 30 pages, very good * [[https://catalog.ldc.upenn.edu/docs/LDC2019T05/introduction.html|2019 - PDTB 3.0 Introduction]] * [[https://catalog.ldc.upenn.edu/docs/LDC2019T05/PDTB3-Annotation-Manual.pdf|Webber et al 2019 - The Penn Discourse Treebank 3.0 Annotation Manual]] * Systems * [[https://www.aclweb.org/anthology/D09-1036.pdf|Lin et al 2009 - Recognizing Implicit Discourse Relations in the Penn Discourse Treebank]] * [[https://arxiv.org/pdf/1011.0835.pdf|Lin et al 2010 - A PDTB-Styled End-to-End Discourse Parser]] * [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.299.8795&rep=rep1&type=pdf|Lin et al 2012 - A PDTB-Styled End-to-End Discourse Parser]] ([[https://github.com/WING-NUS/pdtb-parser|parser]]) The parser that [[https://www.aclweb.org/anthology/J17-3005.pdf|Stab & Gurevych 2017]] used. * [[https://aclanthology.org/W15-4612.pdf|Biran & McKeown 2015 - PDTB Discourse Parsing as a Tagging Task: The Two Taggers Approach]] * Extensions * [[https://arxiv.org/pdf/2010.06294.pdf|Liang et al 2020 - Extending Implicit Discourse Relation Recognition to the PDTB-3]] {{media:pdtb_sense_hierarchy.png?500}}\\ PDTB sense hierarchy from [[https://aclanthology.org/J14-4007.pdf|Prasad 2014]]. There are three levels - most systems do level 2 senses but not the finer-grained level 3 senses. ==== Other Datasets ==== * [[https://www.iso.org/standard/76443.html|ISO 24617-2]] ([[https://www.sis.se/api/document/preview/80026438/|preview]]) * Paper: [[http://www.lrec-conf.org/proceedings/lrec2012/pdf/530_Paper.pdf|Bunt et al 2012 - ISO 24617-2: A Semantically-based Standard for Dialogue Annotation]] * [[https://aclweb.org/mirror/ijcnlp11/downloads/tutorial/tu4_paper.pdf|Bunt 2011 - Guidelines for using ISO standard 24617-2]] * [[https://dialogbank.lsv.uni-saarland.de/wp-content/uploads/2015/12/ISO24617-2_Annotation_Guidelines2017.pdf|Bunt 2017 - Guidelines for using ISO standard 24617-2]] [[https://drive.google.com/file/d/1lKi3dEjMRaushCzmYl7v3Kk7s0TkYOsj/view|Jon's annotated version]] * [[https://pure.uvt.nl/ws/portalfiles/portal/28904525/ISO24617_2_Annotation_Guidelines_Ticc_report.pdf|Bunt 2019 - Guidelines for using ISO standard 24617-2]] * **DialogBank** [[https://dialogbank.lsv.uni-saarland.de/|website]] * [[https://aclanthology.org/L16-1503.pdf|Bunt et al 2016 - The DialogBank]] * [[https://link.springer.com/content/pdf/10.1007/s10579-018-9436-9.pdf|Bunt et al 2019 - The DialogBank: Dialogues with Interoperable Annotations]] Longer paper, great * STAC Corpus * [[https://aclanthology.org/L16-1432.pdf|Asher et al 2016 - Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus]] ===== Unsupervised Discourse Parsing ===== * [[https://www.aclweb.org/anthology/P02-1047.pdf|Marcu & Echihabi 2002 - An Unsupervised Approach to Recognizing Discourse Relations]] The first discourse parser * [[https://www.aclweb.org/anthology/2020.tacl-1.15.pdf|Nishida & Nakayama - Unsupervised discourse constituency parsing using Viterbi EM]] Rivals supervised methods. Doesn't seem to cite [[https://www.aclweb.org/anthology/P02-1047.pdf|Marcu & Echihabi 2002]]! ===== Discourse Coherence ===== ==== Document-Level Coherence ==== * [[https://www.aclweb.org/anthology/N16-1167.pdf|Mesgar & Strube 2016 - Lexical Coherence Graph Modeling Using Word Embeddings]] Uses a readability ranking task. * [[https://www.aclweb.org/anthology/W18-5023.pdf|Lai & Tetreault 2018 - Discourse Coherence in the Wild: A Dataset, Evaluation and Methods]] Introduces the Grammarly Corpus of Discourse Coherence of 1,000 documents with human evaluation of coherence (at the document-level). ==== Coherence in Dialog ==== * [[https://www.aclweb.org/anthology/W08-0127.pdf|Gandhe & Traum 2008 - An Evaluation Understudy for Dialogue Coherence Models]] Evaluates measures of dialog coherence against human judgements. * [[https://www.aclweb.org/anthology/2020.sigdial-1.21.pdf|Cervone & Riccardi 2020 - Is this Dialogue Coherent? Learning from Dialogue Acts and Entities]]. Introduces the Switchboard Coherence (SWBD-Coh) corpus ([[https://github.com/alecervi/ switchboard-coherence-corpus|dataset]]), a dataset of human-human spoken dialogues from the Switchboard corpus annotated with human coherence ratings for each turn. ===== Conversation Disentanglement ==== Disentangling interleaved conversation threads (for example in multiparty dialogs). * [[https://aclanthology.org/P11-1118.pdf| Elsner & Charniak 2011 - Disentangling Chat with Local Coherence Models]] * [[https://arxiv.org/pdf/1810.11118.pdf|Kummerfeld et al 2018 - A Large-Scale Corpus for Conversation Disentanglement]] * [[https://arxiv.org/pdf/2010.11080.pdf|Yu & Joty 2020 - Online Conversation Disentanglement with Pointer Networks]] ===== Linguistic Topics in Discourse Analysis ===== * [[https://en.wikipedia.org/wiki/Cohesion_(linguistics)|Textual cohesion (linguistics topic)]]. There are metrics for this (ask Pranav about this) * Centering theory. See [[https://aclanthology.org/J04-3003.pdf|this paper]]. ===== Applications ===== * Summarization * [[https://arxiv.org/pdf/2104.08400.pdf|Chen & Yang 2021 - Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs]] ===== Related Pages ===== * [[Dialog]] * [[Topic Detection]]