====== Graphical Models ====== Graphical models (or probabilistic graphical models, PGMs) are sub-area of machine learning and statistics. PGMs are a framework for representing independence assumptions of random variables in probability distributions. Broadly, the study of PGMs includes the study of algorithms for learning and inference for these complex probability distributions. PGMs have applications in machine learning, statistics, natural language processing, speech recognition, computer vision, robotics, and other areas. Topics include Bayesian Networks, Hidden Markov Models (HMMs), Conditional Random Fields (CRFs), Markov Random Fields (MRFs), Variational Inference and [[bayesian_methods#Bayesian nonparametrics]]. ===== Overviews ===== * Graphical Models * [[https://direct.mit.edu/books/edited-volume/3811/chapter-standard/125067/Graphical-Models-in-a-Nutshell|Koller et al 2007 - Graphical Models in a Nutshell]] (book chapter) * Deep Latent Variable Models * Paper: [[https://arxiv.org/pdf/1812.06834.pdf|Kim et al 2018 - A Tutorial on Deep Latent Variable Models of Natural Language]] ===== Models ===== * Bayesian Networks * Markov Random Fields * [[https://www.youtube.com/watch?v=iBQkZdPHlCs|Bert Huang's Video]] * Factor Graphs ===== Inference ===== * Belief Propagation * [[http://helper.ipam.ucla.edu/publications/gss2013/gss2013_11344.pdf|Book chapter]] * [[http://www.cs.cmu.edu/~mgormley/courses/10418/slides/lecture9-bp.pdf|Matt Gormley's slides]] * [[https://www.youtube.com/watch?v=meBWAboEWQk|Bert Huang's Video]] **Talks about relation of BP and Lagrangian relaxation at the end.** * Markov Chain Monte-Carlo (MCMC) * Variational Inference * [[https://www.youtube.com/watch?v=smfWKhDcaoA|Topic Models: Variational Inference for Latent Dirichlet Allocation (video)]] * [[https://arxiv.org/pdf/1601.00670.pdf|Blei et al 2016 - Variational Inference: A Review for Statisticians]] * Great description here: (see section 2.2) [[https://arxiv.org/pdf/1603.00788.pdf#page=4|Kucukelbir 2016]] * Great video: [[https://www.youtube.com/watch?v=Dv86zdWjJKQ|Blei - Variational Inference: Foundations and Innovations]] (nice overview at ~10:00) ===== Old Papers ===== * [[http://web.cs.iastate.edu/~honavar/factorgraphs.pdf|Kschischang et al 1998 - Factor Graphs and The Sum-Product Algorithm]] The paper that introduced factor graphs * [[https://www.aclweb.org/anthology/P05-1045.pdf|Finkel et al 2005 - Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling]] ===== Recent Papers ===== * [[https://arxiv.org/pdf/1505.04406.pdf|Bach et al 2017 - Hinge-Loss Markov Random Fields and Probabilistic Soft Logic]] * **[[https://arxiv.org/pdf/2010.12048.pdf|Chiang & Riley 2020 - Factor Graph Grammars]]** Introduces a new kind of graphical model (factor graph grammars) that are more expressive than plate notation or dynamic graphical models. It is expressive enough to represent CFG parsing a graphical model. Very cool. * [[https://www.jmlr.org/papers/volume21/18-856/18-856.pdf|Al-Shedivat et al 2020 - Contextual Explanation Networks]] ===== Interesting NLP Deep Learning + PGM Papers ===== See also recent advances in [[nlp:HMM|HMMs]] and [[Conditional Random Field|CRFs]]. * [[https://arxiv.org/pdf/1702.00887.pdf|Kim et al 2017 - Structured Attention Networks]] * [[https://aclanthology.org/C18-1142.pdf|Bahuleyan et al 2018 - Variational Attention for Sequence-to-Sequence Models]] * [[https://www.aclweb.org/anthology/K18-1001.pdf|Thai et al 2018 - Embedded-State Latent Conditional Random Fields for Sequence Labeling]] * [[http://proceedings.mlr.press/v80/kaiser18a/kaiser18a.pdf|Kaiser et al 2018 - Fast Decoding in Sequence Models Using Discrete Latent Variables]] * [[https://arxiv.org/pdf/1906.07880.pdf|Wang et al 2019 - Second-Order Semantic Dependency Parsing with End-to-End Neural Networks]] Uses loopy BP and variational inference * [[https://arxiv.org/pdf/1902.04094.pdf|Wang & Cho 2019 - BERT has a Mouth, and It Must Speak:BERT as a Markov Random Field Language Model]] WARNING: Mistake in this paper, [[https://sites.google.com/site/deepernn/home/blog/amistakeinwangchoberthasamouthanditmustspeakbertasamarkovrandomfieldlanguagemodel|it's not an MRF]] * [[https://www.aclweb.org/anthology/2020.emnlp-main.406.pdf|Gao & Gormley 2020 - Training for Gibbs Sampling on Conditional Random Fields with Neural Scoring Factors]] (basically adapted [[https://www.aclweb.org/anthology/P05-1045.pdf|Finkel et al 2005]] to the neural era) * [[https://arxiv.org/pdf/2002.07233.pdf|Lee et al 2020 - On the Discrepancy between Density Estimation and Sequence Generation]] Uses latent variables for fast non-autoregressive generation * [[http://proceedings.mlr.press/v119/srivastava20a/srivastava20a.pdf|Srivastava et al 2020 - Robustness to Spurious Correlations via Human Annotations]] * [[https://arxiv.org/pdf/2106.02736.pdf|Goyal et al 2021 - Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis–Hastings]] * [[https://arxiv.org/pdf/2105.15021.pdf|Yang et al 2021 - Neural Bi-Lexicalized PCFG Induction]] Uses a Bayesian network to describe their model * [[https://arxiv.org/pdf/2406.06950|Hou et al 2024 - A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation]] ===== Recent NLP Papers that Use PGMs ==== * [[https://aclanthology.org/2021.acl-long.346.pdf|Rodriguez et al 2021 - Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?]] * **Belief Propagation** * [[https://aclanthology.org/P11-1048.pdf|Auli & Lopez 2011 - A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing]] * [[https://aclanthology.org/W13-4069.pdf|Lee 2013 - Structured Discriminative Model For Dialog State Tracking]] * [[https://aclanthology.org/2020.findings-emnlp.142.pdf|Chen et al 2020 - Neural Dialogue State Tracking with Temporally Expressive Networks]] * [[https://aclanthology.org/P19-1382.pdf|Bjervaet al 2019 - Uncovering Probabilistic Implications in Typological Knowledge Bases]] * **MCMC and Sampling** * [[https://aclanthology.org/D12-1101.pdf|Singh et al 2012 - Monte Carlo MCMC: Efficient Inference by Approximate Sampling]] * **[[https://aclanthology.org/D18-1405.pdf|Ma & Collins 2018 - Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency]]** * [[https://aclanthology.org/N18-1085.pdf|Lin & Eisner 2018 - Neural Particle Smoothing for Sampling from Conditional Sequence Models]] * [[https://aclanthology.org/2020.aacl-main.21.pdf|Wang et al 2020 - Neural Gibbs Sampling for Joint Event Argument Extraction]] * [[https://aclanthology.org/2020.acl-main.196.pdf|Logan et al 2020 - On Importance Sampling-Based Evaluation of Latent Language Models]] * [[https://www.aclweb.org/anthology/2020.emnlp-main.406.pdf|Gao & Gormley 2020 - Training for Gibbs Sampling on Conditional Random Fields with Neural Scoring Factors]] (basically adapted [[https://www.aclweb.org/anthology/P05-1045.pdf|Finkel et al 2005]] to the neural era) * [[https://arxiv.org/pdf/2106.02736.pdf|Goyal & Dyer 2021 - Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis–Hastings]] * **Variational Inference** * [[https://aclanthology.org/P19-1186.pdf|Lee et al 2019 - Semi-supervised Stochastic Multi-Domain Learning using Variational Inference]] * [[https://aclanthology.org/2020.acl-main.367.pdf|Emerson 2020 - Autoencoding Pixies: Amortised Variational Inference with Graph Convolutions for Functional Distributional Semantics]] * **Other Papers** * [[https://arxiv.org/pdf/2306.05836.pdf|Jin et al 2023 - Can Large Language Models Infer Causation from Correlation?]] ===== Courses, Tutorials, and Overview Papers ===== * Sometimes PGMS are covered in the UCSC course [[https://courses.soe.ucsc.edu/courses/cse290c|CSE 290C]] (when [[https://courses.soe.ucsc.edu/courses/cmps290c/Fall15/01|Lisa Geetoor teaches it]]) * **Course at CMU**: Probabilistic Graphical Models [[https://www.cs.cmu.edu/~epxing/Class/10708-20/index.html|Spring 2020]] [[https://www.cs.cmu.edu/~epxing/Class/10708-20/lectures.html|Lectures with videos]] [[https://www.cs.cmu.edu/~epxing/Class/10708/|2014 (with videos and scribe notes)]] * Stanford course: [[https://ermongroup.github.io/cs228/|CS 228 - Probabilistic Graphical Models]] * **Matt Gormley's course at CMU**: [[https://www.cs.cmu.edu/~mgormley/courses/10418/|10418]] (with videos) * **Best overview tutorial:** [[https://kuleshov.github.io/cs228-notes/|CS228 Lecture Notes]] * [[https://users.soe.ucsc.edu/~niejiazhong/slides/murphy.pdf|Tutorial on Probabilistic Graphical Models]] * [[https://linqs.soe.ucsc.edu/sites/default/files/papers/koller-book07.pdf|Book Chapter: Graphical Models in a Nutshell]] * Paper: [[paper:A Tutorial on Deep Latent Variable Models of Natural Language]] ===== Related Pages ===== * [[Bayesian Methods]] Bayesian methods often use techniques from graphical models, such as MCMC and variational inference, as well as representing likelihood and prior as a graphical model * [[Conditional Random Field]] * [[Probabilistic Logic]] * [[nlp:Topic Modeling]]