======= Bayesian Methods ======= Bayesian methods are a sub-field of statistics (bayesian statistics), and are often used in machine learning. Bayesian methods assume a prior over the space of models (the prior belief), and after observing the data, update the belief over the space of models (the posterior belief). The posterior belief can be used to make predictions, etc. (This is in contrast to frequentist methods that are "designed to create procedures with certain frequency gaurantees (consistency, coverage, minimaxity etc)" ([[http://www.stat.cmu.edu/~larry/=sml/nonparbayes.pdf|Nonparametric Bayesian Methods]], Chapter 8)). Bayesian methods often have very good frequentist properties, and often perform well in practice. There is recent interest in combining bayesian methods with deep learning. For examples, see the [[http://bayesiandeeplearning.org/|Bayesian Deep Learning Workshop at NeurIPS]]. Bayesian methods such as bayesian optimization are also used in [[hyperparameter tuning]]. ===== Bayesian Papers in Machine Learning and NLP ===== * [[https://aclanthology.org/N07-1018.pdf|Johnson et al 2006 - Bayesian Inference for PCFGs via Markov Chain Monte Carlo]] * [[https://aclanthology.org/D07-1031.pdf|Johnson 2007 - Why doesn’t EM find good HMM POS-taggers?]] ===== Bayesian Neural Networks ===== ==== Overviews ==== * Blog post: [[https://jorisbaan.nl/2021/03/02/introduction-to-bayesian-deep-learning.html|Intro to Bayesian Deep Learning]] * [[https://arxiv.org/pdf/2001.10995.pdf|Wilson 2020 - The Case for Bayesian Deep Learning]] * [[https://arxiv.org/pdf/2007.06823.pdf|Jospin et al 2020 - Hands-on Bayesian Neural Networks – a Tutorial for Deep Learning Users]] * [[https://arxiv.org/pdf/2006.12024.pdf|Goan & Fookes 2020 - Bayesian Neural Networks: An Introduction and Survey]] * [[https://arxiv.org/pdf/2006.01490.pdf|Charnock et al 2020 - Bayesian Neural Networks]] (Draft book chapter) ==== Papers ==== * [[https://dl.acm.org/doi/pdf/10.1145/3296957.3173212|Cai et al 2018 - VIBNN: Hardware Acceleration of Bayesian Neural Networks]] * [[https://arxiv.org/pdf/1901.02731.pdf|Shridhar et al 2019 - A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference]] * [[https://papers.nips.cc/paper/2020/file/0b1ec366924b26fc98fa7b71a9c249cf-Paper.pdf|He et al 2020 - Bayesian Deep Ensembles via the Neural Tangent Kernel]] * [[https://arxiv.org/pdf/2002.04033.pdf|Karaletsos & Bui 2020 - Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights]] ===== Variational Bayes ===== ==== Overviews ==== * [[https://arxiv.org/pdf/2103.01327.pdf|Tran et al 2021 - A Practical Tutorial on Variational Bayes]] ==== Papers ==== * [[https://arxiv.org/pdf/1705.03439.pdf|Wang & Blei - Frequentist Consistency of Variational Bayes]] ===== Bayesian Nonparametrics ===== Bayesian nonparametric methods are a sub-area of bayesian statistics, and are also commonly used in machine learning. They were very popular in NLP before the deep learning era started in 2014. ==== Overviews ==== * [[https://groups.seas.harvard.edu/courses/cs281/papers/orbanz-teh-2010.pdf|Orbanz & Teh 2010 - Bayesian Nonparametric Models]] * [[http://www.stat.cmu.edu/~larry/=sml/nonparbayes|Wasserman - Nonparametric Bayesian Methods]] Notes from a statistics course ==== Resources ==== * Talks (with videos) * Talk by Michael Jordan: [[http://videolectures.net/icml05_jordan_dpcrp/|Dirichlet Processes, Chinese Restaurant Processes, and All That]] * Yee Whye Teh's talks. [[http://videolectures.net/mlss2011_teh_nonparametrics/|Video]] [[http://videolectures.net/mlss09uk_teh_nbm/|Another video]] Slides: [[https://www.stats.ox.ac.uk/~teh/teaching/npbayes/mlss2011F.pdf|Introduction to Bayesian Nonparametrics]] * [[http://www.gatsby.ucl.ac.uk/~porbanz/npb-tutorial.html|Peter Orbanzs' Resources on Bayesian Nonparametrics]] * [[https://en.wikipedia.org/wiki/Dirichlet_process|Wikipedia - Dirichlet process]] * [[https://en.wikipedia.org/wiki/Chinese_restaurant_process|Wikipedia - Chinese restaurant process]] * [[https://en.wikipedia.org/wiki/Pitman%E2%80%93Yor_process|Wikipedia - Pitman–Yor process]] ==== Papers ==== * [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63.6905&rep=rep1&type=pdf|Blei & Jordan 2004 - Variational Inference for Dirichlet Process Mixtures]] * [[https://proceedings.neurips.cc/paper/2005/file/4b21cf96d4cf612f239a6c322b10c8fe-Paper.pdf|Goldwater et al 2005 - Interpolating Between Types and Tokens by Estimating Power-Law Generators]] "We show that taking a particular stochastic process – the Pitman-Yor process – as an adaptor justifies the appearance of type frequencies in formal analyses of natural language." * [[http://www.stats.ox.ac.uk/~teh/research/compling/hpylm.pdf|Teh 2006 - A Bayesian Interpretation of Interpolated Kneser-Ney]] * [[https://proceedings.neurips.cc/paper/2006/file/62f91ce9b820a491ee78c108636db089-Paper.pdf|Johnson et al 2006 - Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models]] * [[https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.6544&rep=rep1&type=pdf|Teh 2007 - Hierarchical Dirichlet Processes]] See also [[https://en.wikipedia.org/wiki/Hierarchical_Dirichlet_process|Wikipedia - Hierarchical Dirichlet process]] [[https://www.cs.cmu.edu/~epxing/Class/10708/scribe_notes/scribe_note_lecture20.pdf|Viswanathan & Faruqui 2014]] * [[https://aclanthology.org/P12-1046.pdf|Shindo et al 2012 - Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing]] 92.4 on PTB. Was SOTA until [[https://arxiv.org/pdf/1602.07776.pdf|Dyer 2016]] surpassed it. * [[https://aclanthology.org/N13-1140.pdf|Chahuneau et al 2013 - Knowledge-Rich Morphological Priors for Bayesian Language Models]] Combines a finite-state guesser with Bayesian non-parametrics ==== Papers Combining Deep Learning and Bayesian Nonparametrics ==== * [[https://arxiv.org/pdf/2001.00689.pdf|Lee et al 2020 - A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning]]\\ * [[http://bayesiandeeplearning.org/2016/papers/BDL_20.pdf|Nalisnick et al 2020 - Approximate Inference for Deep Latent Gaussian Mixtures]] ===== People ===== * [[https://scholar.google.com/citations?user=8OYE6iEAAAAJ&hl=en|David Blei]] * [[https://scholar.google.com/citations?user=_ZxvlzoAAAAJ&hl=en|Sharon Goldwater]] * [[https://scholar.google.com.au/citations?user=Z_kok3sAAAAJ&hl=en|Mark Johnson]] * [[https://scholar.google.com/citations?user=yxUduqMAAAAJ&hl=en|Michael Jordan]] * [[https://scholar.google.com/citations?user=y-nUzMwAAAAJ&hl=en|Yee Whye Teh]] ===== Related Pages ===== * [[Graphical Models]] Bayesian methods often use techniques from graphical models, such as MCMC and variational inference, as well as representing likelihood and prior as a graphical model