====== Infinite Neural Networks ====== Infinite neural networks are neural networks that have an infinite number of hidden units or an infinite number of layers. ===== Overviews ===== * Neural Tangent Kernel * [[https://rajatvd.github.io/NTK/|Understanding the Neural Tangent Kernel (blog post)]] * [[https://lilianweng.github.io/posts/2022-09-08-ntk/|Some Math behind Neural Tangent Kernel (blog post)]] * [[https://arxiv.org/pdf/2007.15801.pdf|Lee et al 2020 - Finite Versus Infinite Neural Networks: an Empirical Study]] ===== Papers ===== * Unbounded Depth NNs: [[https://proceedings.mlr.press/v162/nazaret22a/nazaret22a.pdf|Nazaret & Blei 2022 - Variational Inference for Infinitely Deep Neural Networks]] ==== Neural Tangent Kernel ==== See also related work [[https://github.com/google/neural-tangents#references|here]] and [[https://github.com/google/neural-tangents/wiki/Overparameterized-Neural-Networks:-Theory-and-Empirics|here]]. * [[https://www.cs.toronto.edu/~radford/ftp/pin.pdf|Neal 1994 - Priors for Infinite Networks]] [[https://www.cs.toronto.edu/~radford/pin.abstract.html|Other versions]] * [[https://papers.nips.cc/paper/1996/file/ae5e3ce40e0404a45ecacaaf05e5f735-Paper.pdf|Williams 1996 - Computing with infinite networks]] * [[https://arxiv.org/pdf/1711.00165.pdf|Lee et al 2017 - Deep Neural Networks as Gaussian Processes]] * [[https://arxiv.org/pdf/1806.07572.pdf|Jacot et al 2018 - Neural Tangent Kernel: Convergence and Generalization in Neural Networks]] * [[https://arxiv.org/pdf/1902.06720.pdf|Lee et al 2019 - Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent]] * [[https://arxiv.org/pdf/1904.11955.pdf|Arora et al 2019 - On Exact Computation with an Infinitely Wide Neural Net]] * **[[https://arxiv.org/pdf/1910.08013.pdf|Aitchison 2019 - Why bigger is not always better: on finite and infinite neural networks]]** * [[https://arxiv.org/pdf/1912.13053.pdf|Xiao et al 2019 - Disentangling Trainability and Generalization in Deep Neural Networks]] * [[https://arxiv.org/pdf/1912.02803.pdf|Novak et al 2019 - Neural Tangents: Fast and Easy Infinite Neural Networks in Python]] [[https://github.com/google/neural-tangents|github]] [[https://colab.research.google.com/github/google/neural-tangents/blob/master/notebooks/neural_tangents_cookbook.ipynb|Colab notebook]] * [[https://papers.nips.cc/paper/2020/file/0b1ec366924b26fc98fa7b71a9c249cf-Paper.pdf|He et al 2020 - Bayesian Deep Ensembles via the Neural Tangent Kernel]] * [[https://arxiv.org/pdf/2001.07301.pdf|Sohl-Dickstein et al 2020 - On the Infinite Width Limit of Neural Networks with a Standard Parameterization]] * [[https://arxiv.org/pdf/2006.14548.pdf|Yang 2020 - Tensor Programs II: Neural Tangent Kernel for Any Architecture]] * [[https://arxiv.org/pdf/2011.14522.pdf|Yang & Hu 2020 - Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks]] * [[https://arxiv.org/pdf/2105.03703.pdf|Yang & Littwin 2021 - Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics]] * [[https://arxiv.org/pdf/2203.03466.pdf|Yang et al 2022 - Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer]] * **[[https://arxiv.org/pdf/2007.15801.pdf|Lee et al 2020 - Finite Versus Infinite Neural Networks: an Empirical Study]]** * [[https://arxiv.org/abs/2010.01092|Liu et al 2020 - On the linearity of large non-linear models: when and why the tangent kernel is constant]] * [[https://arxiv.org/pdf/2012.00152.pdf|Domingos 2020 - Every Model Learned by Gradient Descent Is Approximately a Kernel Machine]] * [[https://arxiv.org/pdf/2206.08720.pdf|Novak et al 2022 - Fast Finite Width Neural Tangent Kernel]] [[https://youtu.be/8MWOhYg89fY?t=10984|video]] [[https://github.com/google/neural-tangents|github]] [[https://colab.research.google.com/github/google/neural-tangents/blob/main/notebooks/empirical_ntk_fcn.ipynb|code example]] * **[[https://openreview.net/pdf?id=tUMr0Iox8XW|Yang et al 2022 - Efficient Computation of Deep Nonlinear Infinite-Width Neural Networks that Learn Features]]** ===== Notes ===== Jeff's thoughts: Although objective functions for training finite neural networks are usually non-convex, for neural networks with an infinite number of hidden units (infinitely wide) they are usually convex (this is because the space of infinite neural networks is linear: any infinite NN is just a linear combination of all possible parameters in the parameter space). ===== Software ===== * [[https://github.com/google/neural-tangents|Neural Tangents]] [[https://arxiv.org/pdf/1912.02803.pdf|paper]] [[https://colab.research.google.com/github/google/neural-tangents/blob/master/notebooks/neural_tangents_cookbook.ipynb|Colab notebook]] ===== People ===== * [[https://scholar.google.com/citations?user=Xz4RAJkAAAAJ&hl=en|Greg Yang]]