====== Natural Language Generation ====== ===== Overviews ===== * Introduction: [[https://jlab.soe.ucsc.edu/nlp-wiki/lib/exe/fetch.php?media=book:eisenstein-nlp-notes.pdf#page=475|Eisenstein p. 457]] * [[https://arxiv.org/pdf/1703.09902.pdf|Gatt et al 2017 - Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation]] Now outdated. Goes over traditional non-neural methods. * [[https://arxiv.org/pdf/2010.04389.pdf|Yu et al 2020 - A Survey of Knowledge-Enhanced Text Generation]] ===== Data-to-Text ===== Data-to-text generation is generation where the input is formatted data such as tables of numbers. A typical example is generating human-readable weather reports from numbers predicted from weather simulations. An example dataset is [[https://github.com/harvardnlp/boxscore-data|RotoWire]] ([[https://arxiv.org/pdf/1707.08052.pdf|paper]]). * [[https://aclanthology.org/W07-2315.pdf|Reiter 2007 - An Architecture for Data-to-Text Systems]] * [[https://core.ac.uk/download/pdf/188246925.pdf|Belz 2008 - Automatic Generation of Weather Forecast Texts Using Comprehensive Probabilistic Generation-Space Models]] * [[https://arxiv.org/pdf/1809.00582.pdf|Puduppully et al 2018 - Data-to-Text Generation with Content Selection and Planning]] A good baseline, used as baseline in Workshop in NLG and Translation * [[https://arxiv.org/pdf/2205.11055.pdf|Zhang et al 2022 - TempLM: Distilling Language Models into Template-Based Generators]] ===== Meaning-to-Text ===== This is generation from a meaning representation, such as [[Abstract Meaning Representation|AMR]] or slots and values - the inverse of [[semantic parsing]]. See also [[Abstract Meaning Representation#Generation|AMR - Generation]]. ===== Controllable Text Generation ===== * Overviews * [[https://lilianweng.github.io/posts/2021-01-02-controllable-text-generation/|Controllable Neural Text Generation]] ===== Evaluation ===== Survey: [[https://arxiv.org/pdf/2006.14799.pdf|2020 - Evaluation of Text Generation: A Survey]] * [[https://aclanthology.org/E06-1040.pdf|Belz & Reiter 2006 - Comparing Automatic and Human Evaluation of NLG systems]] * [[https://www.aclweb.org/anthology/N19-1169.pdf|Hashimoto et al 2019 - Unifying Human and Statistical Evaluation for Natural Language Generation]] * [[https://www.aclweb.org/anthology/W19-8644.pdf|Dusek et al 2019 - Automatic Quality Estimation for Natural Language Generation:Ranting (Jointly Rating and Ranking)]] * [[https://arxiv.org/pdf/2004.04696.pdf|Sallam et al 2020 - BLEURT: Learning Robust Metrics for Text Generation]] (ACL 2020) * [[https://arxiv.org/pdf/2102.01672.pdf|Gehrmann et al 2021 - The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics]] ([[https://gem-benchmark.com/|Website]]) * [[https://aclanthology.org/2021.tacl-1.87.pdf|Freitag et al 2021 - Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation]] Uses Multidimensional Quality Metrics (MQM) framework. MT paper, used in [[https://www2.statmt.org/wmt24/metrics-task.html|WMT]] * [[https://arxiv.org/pdf/2107.01294.pdf|Dou et al 2021 - Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text]] * [[https://arxiv.org/pdf/2406.07935|Ruan et al 2024 - Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation]] ===== Historical Papers ===== Papers from a while ago. * [[https://aclanthology.org/C10-1012.pdf|Bohnet et al 2010 - Broad Coverage Multilingual Deep Sentence Generation with a Stochastic Multi-Level Realizer]] ===== Datasets ===== See also [[http://nlpprogress.com/english/data_to_text_generation.html|NLP progress - Generation]] and [[https://aclweb.org/aclwiki/Data_sets_for_NLG|ACL Wiki - Datasets for NLG]] * AMR dataset * E2E dataset: [[https://github.com/tuetschek/e2e-dataset|github]] or [[http://www.macs.hw.ac.uk/InteractionLab/E2E/|here]] ([[https://arxiv.org/pdf/1706.09254.pdf|paper]]) * [[https://arxiv.org/pdf/1810.00278.pdf|MultiWOZ]] Data to text generation datset * Rotowire * WikiBio * WebNLG * [[https://nlds.soe.ucsc.edu/viggo|ViGGO Dataset]], created a UCSC ([[https://arxiv.org/pdf/1910.12129.pdf|paper]]) ===== Shared Tasks ===== * [[https://arxiv.org/pdf/1910.13299.pdf|Hayashi et al 2019 - Findings of the Third Workshop on Neural Generation and Translation]] ===== People ===== * [[https://scholar.google.com/citations?user=trwwiW4AAAAJ&hl=en|Anya Belz]] * [[https://scholar.google.com/citations?user=Ns0YuP0AAAAJ&hl=en|Ehud Reiter]] ===== Related Pages ===== * [[Seq2seq|Sequence to Sequence Models]] * [[Watermarking]]