Differences

This shows you the differences between two versions of the page.

--- ml:nn_architectures [2025/03/25 07:33] – [Sequence Networks] jmflanig
+++ ml:nn_architectures [2025/03/25 07:34] (current) – [Sequence Networks] jmflanig
@@ Line 45: / Line 45: @@
   * [[https://arxiv.org/pdf/2105.03824.pdf|FNet]] A faster, attention-free Transformer architecture based on Fourier transforms
   * [[https://arxiv.org/pdf/2305.10991.pdf|Anthe: Less is More! A slim architecture for optimal language translation]]
-  * [[https://arxiv.org/pdf/2307.08621.pdf|RetNet (Retentive Network)]]
   * [[https://arxiv.org/pdf/2305.13048|RWKV (Receptance Weighted Key Value) Network]] Information is passed across positions using a positional weight decay which gates the information. Allows parallel training like the transformer, but more efficient inference like the RNN
+  * [[https://arxiv.org/pdf/2307.08621.pdf|RetNet (Retentive Network)]]
 ===== Tree Networks =====