Differences

This shows you the differences between two versions of the page.

--- ml:graphical_models [2021/11/30 11:53] – jmflanig
+++ ml:graphical_models [2025/05/03 01:43] (current) – [Interesting NLP Deep Learning + PGM Papers] jmflanig
@@ Line 1: / Line 1: @@
 ====== Graphical Models ======
 Graphical models (or probabilistic graphical models, PGMs) are sub-area of machine learning and statistics.  PGMs are a framework for representing independence assumptions of random variables in probability distributions.  Broadly, the study of PGMs includes the study of algorithms for learning and inference for these complex probability distributions.  PGMs have applications in machine learning, statistics, natural language processing, speech recognition, computer vision, robotics, and other areas.  Topics include Bayesian Networks, Hidden Markov Models (HMMs), Conditional Random Fields (CRFs), Markov Random Fields (MRFs), Variational Inference and [[bayesian_methods#Bayesian nonparametrics]].
+===== Overviews =====
+  * Graphical Models
+    * [[https://direct.mit.edu/books/edited-volume/3811/chapter-standard/125067/Graphical-Models-in-a-Nutshell|Koller et al 2007 - Graphical Models in a Nutshell]] (book chapter)
+  * Deep Latent Variable Models
+    * Paper: [[https://arxiv.org/pdf/1812.06834.pdf|Kim et al 2018 - A Tutorial on Deep Latent Variable Models of Natural Language]]
 ===== Models =====
@@ Line 27: / Line 33: @@
   * [[https://arxiv.org/pdf/1505.04406.pdf|Bach et al 2017 - Hinge-Loss Markov Random Fields and Probabilistic Soft Logic]]
   * **[[https://arxiv.org/pdf/2010.12048.pdf|Chiang & Riley 2020 - Factor Graph Grammars]]**  Introduces a new kind of graphical model (factor graph grammars) that are more expressive than plate notation or dynamic graphical models.  It is expressive enough to represent CFG parsing a graphical model.  Very cool.
+  * [[https://www.jmlr.org/papers/volume21/18-856/18-856.pdf|Al-Shedivat et al 2020 - Contextual Explanation Networks]]
 ===== Interesting NLP Deep Learning + PGM Papers =====
 See also recent advances in [[nlp:HMM|HMMs]] and [[Conditional Random Field|CRFs]].
   * [[https://arxiv.org/pdf/1702.00887.pdf|Kim et al 2017 - Structured Attention Networks]]
+  * [[https://aclanthology.org/C18-1142.pdf|Bahuleyan et al 2018 - Variational Attention for Sequence-to-Sequence Models]]
   * [[https://www.aclweb.org/anthology/K18-1001.pdf|Thai et al 2018 - Embedded-State Latent Conditional Random Fields for Sequence Labeling]]
   * [[http://proceedings.mlr.press/v80/kaiser18a/kaiser18a.pdf|Kaiser et al 2018 - Fast Decoding in Sequence Models Using Discrete Latent Variables]]
@@ Line 38: / Line 46: @@
   * [[https://arxiv.org/pdf/2002.07233.pdf|Lee et al 2020 - On the Discrepancy between Density Estimation and Sequence Generation]] Uses latent variables for fast non-autoregressive generation
   * [[http://proceedings.mlr.press/v119/srivastava20a/srivastava20a.pdf|Srivastava et al 2020 - Robustness to Spurious Correlations via Human Annotations]]
+  * [[https://arxiv.org/pdf/2106.02736.pdf|Goyal et al 2021 - Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis–Hastings]]
+  * [[https://arxiv.org/pdf/2105.15021.pdf|Yang et al 2021 - Neural Bi-Lexicalized PCFG Induction]] Uses a Bayesian network to describe their model
+  * [[https://arxiv.org/pdf/2406.06950|Hou et al 2024 - A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation]]
 ===== Recent NLP Papers that Use PGMs ====
@@ Line 47: / Line 57: @@
     * [[https://aclanthology.org/2020.findings-emnlp.142.pdf|Chen et al 2020 - Neural Dialogue State Tracking with Temporally Expressive Networks]]
     * [[https://aclanthology.org/P19-1382.pdf|Bjervaet al 2019 - Uncovering Probabilistic Implications in Typological Knowledge Bases]]
-  * **MCMC**
+  * **MCMC and Sampling**
     * [[https://aclanthology.org/D12-1101.pdf|Singh et al 2012 - Monte Carlo MCMC: Efficient Inference by Approximate Sampling]]
+    * **[[https://aclanthology.org/D18-1405.pdf|Ma & Collins 2018 - Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency]]**
+    * [[https://aclanthology.org/N18-1085.pdf|Lin & Eisner 2018 - Neural Particle Smoothing for Sampling from Conditional Sequence Models]]
+    * [[https://aclanthology.org/2020.aacl-main.21.pdf|Wang et al 2020 - Neural Gibbs Sampling for Joint Event Argument Extraction]]
+    * [[https://aclanthology.org/2020.acl-main.196.pdf|Logan et al 2020 - On Importance Sampling-Based Evaluation of Latent Language Models]]
     * [[https://www.aclweb.org/anthology/2020.emnlp-main.406.pdf|Gao & Gormley 2020 - Training for Gibbs Sampling on Conditional Random Fields with Neural Scoring Factors]] (basically adapted [[https://www.aclweb.org/anthology/P05-1045.pdf|Finkel et al 2005]] to the neural era)
+    * [[https://arxiv.org/pdf/2106.02736.pdf|Goyal & Dyer 2021 - Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis–Hastings]]
   * **Variational Inference**
     * [[https://aclanthology.org/P19-1186.pdf|Lee et al 2019 - Semi-supervised Stochastic Multi-Domain Learning using Variational Inference]]
     * [[https://aclanthology.org/2020.acl-main.367.pdf|Emerson 2020 - Autoencoding Pixies: Amortised Variational Inference with Graph Convolutions for Functional Distributional Semantics]]
+  * **Other Papers**
+    * [[https://arxiv.org/pdf/2306.05836.pdf|Jin et al 2023 - Can Large Language Models Infer Causation from Correlation?]]
 ===== Courses, Tutorials, and Overview Papers =====
@@ Line 58: / Line 75: @@
   * Sometimes PGMS are covered in the UCSC course [[https://courses.soe.ucsc.edu/courses/cse290c|CSE 290C]] (when [[https://courses.soe.ucsc.edu/courses/cmps290c/Fall15/01|Lisa Geetoor teaches it]])
   * **Course at CMU**: Probabilistic Graphical Models [[https://www.cs.cmu.edu/~epxing/Class/10708-20/index.html|Spring 2020]] [[https://www.cs.cmu.edu/~epxing/Class/10708-20/lectures.html|Lectures with videos]] [[https://www.cs.cmu.edu/~epxing/Class/10708/|2014 (with videos and scribe notes)]]
+  * Stanford course: [[https://ermongroup.github.io/cs228/|CS 228 - Probabilistic Graphical Models]]
   * **Matt Gormley's course at CMU**: [[https://www.cs.cmu.edu/~mgormley/courses/10418/|10418]] (with videos)
   * **Best overview tutorial:** [[https://kuleshov.github.io/cs228-notes/|CS228 Lecture Notes]]