User Tools

Site Tools


ml:nn_architectures

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:nn_architectures [2024/05/24 23:05] jmflanigml:nn_architectures [2025/03/25 07:34] (current) – [Sequence Networks] jmflanig
Line 45: Line 45:
   * [[https://arxiv.org/pdf/2105.03824.pdf|FNet]] A faster, attention-free Transformer architecture based on Fourier transforms   * [[https://arxiv.org/pdf/2105.03824.pdf|FNet]] A faster, attention-free Transformer architecture based on Fourier transforms
   * [[https://arxiv.org/pdf/2305.10991.pdf|Anthe: Less is More! A slim architecture for optimal language translation]]   * [[https://arxiv.org/pdf/2305.10991.pdf|Anthe: Less is More! A slim architecture for optimal language translation]]
 +  * [[https://arxiv.org/pdf/2305.13048|RWKV (Receptance Weighted Key Value) Network]] Information is passed across positions using a positional weight decay which gates the information. Allows parallel training like the transformer, but more efficient inference like the RNN
   * [[https://arxiv.org/pdf/2307.08621.pdf|RetNet (Retentive Network)]]   * [[https://arxiv.org/pdf/2307.08621.pdf|RetNet (Retentive Network)]]
  
Line 74: Line 75:
  
 ===== Matrices === ===== Matrices ===
-Various representations of matrices, such as sparse, or low-dimentional ones.+Various representations of matrices, such as sparse, or low-dimensional ones.
   * Tensor networks   * Tensor networks
 +  * [[https://arxiv.org/pdf/2106.09685|LoRA]]
   * [[https://arxiv.org/pdf/2204.00595|Monarch Matrices]]   * [[https://arxiv.org/pdf/2204.00595|Monarch Matrices]]
  
ml/nn_architectures.1716591914.txt.gz · Last modified: 2024/05/24 23:05 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki