User Tools

Site Tools


ml:nn_architectures

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
ml:nn_architectures [2025/03/25 07:33] – [Sequence Networks] jmflanigml:nn_architectures [2025/03/25 07:34] (current) – [Sequence Networks] jmflanig
Line 45: Line 45:
   * [[https://arxiv.org/pdf/2105.03824.pdf|FNet]] A faster, attention-free Transformer architecture based on Fourier transforms   * [[https://arxiv.org/pdf/2105.03824.pdf|FNet]] A faster, attention-free Transformer architecture based on Fourier transforms
   * [[https://arxiv.org/pdf/2305.10991.pdf|Anthe: Less is More! A slim architecture for optimal language translation]]   * [[https://arxiv.org/pdf/2305.10991.pdf|Anthe: Less is More! A slim architecture for optimal language translation]]
-  * [[https://arxiv.org/pdf/2307.08621.pdf|RetNet (Retentive Network)]] 
   * [[https://arxiv.org/pdf/2305.13048|RWKV (Receptance Weighted Key Value) Network]] Information is passed across positions using a positional weight decay which gates the information. Allows parallel training like the transformer, but more efficient inference like the RNN   * [[https://arxiv.org/pdf/2305.13048|RWKV (Receptance Weighted Key Value) Network]] Information is passed across positions using a positional weight decay which gates the information. Allows parallel training like the transformer, but more efficient inference like the RNN
 +  * [[https://arxiv.org/pdf/2307.08621.pdf|RetNet (Retentive Network)]]
  
 ===== Tree Networks ===== ===== Tree Networks =====
ml/nn_architectures.1742888007.txt.gz · Last modified: 2025/03/25 07:33 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki