User Tools

Site Tools


ml:nn_architectures

This is an old revision of the document!


Neural Network Architectures

Overviews

Feedforward Networks

  • GLU (also considered a kind of activation, but it's more like a FF architecture). Variants: Shazeer 2020 ReGLU and SwiGLU work well.
  • Capsule networks (also used in a CNN-type architecture)
  • Sparsely-Gated Mixture-of-Experts. Used to greatly scale-up the number of parameters with (sub-linear? check this) increase in computation. Uses many overlapping feedforward networks that are gated by another network. 1000x improvements in model capacity.

Connections

  • ReZero Similar to residual connections, but with a trainable parameter that controls the strength of the nonlinearity (which is initialized to zero).

Sequence Networks

See also State-Space Models.

Tree Networks

Graph Networks

See also Wu et al 2019 - A Comprehensive Survey on Graph Neural Networks and Graph Neural Networks.

  • Graph convolution networks
  • Graph transformers

Activation Functions

See also the table in Wikipedia's Activation functions.

Comparisons:

Set and Pooling Networks

Memory Architectures

RNN Cells

See also Wikipedia - Recurrent Neural Networks and Yu et al 2019 - A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures

  • Feedforward network (Elman network)
  • Feedforward network with residual connections (with careful tuning, has been shown to perform as well as LSTMs I believe)
  • LSTM
    • Forget gate
    • Peephole connections
  • GRU (has been shown not to perform as well as the LSTM cell, for example here)
  • Minimal Gated Unit (MGU)

Position Embeddings

Attention Mechanisms

See also the Attention Mechanisms page.

Neurosymbolic Networks

Dynamic Neural Networks

Miscellaneous Architectures

ml/nn_architectures.1691646427.txt.gz · Last modified: 2023/08/10 05:47 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki