nlp:attention_mechanisms
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| nlp:attention_mechanisms [2023/07/30 18:08] – [Related Pages] jmflanig | nlp:attention_mechanisms [2025/04/04 23:53] (current) – [Key Papers] jmflanig | ||
|---|---|---|---|
| Line 18: | Line 18: | ||
| * [[https:// | * [[https:// | ||
| * **[[https:// | * **[[https:// | ||
| + | * [[https:// | ||
| + | * **LSH Attention** | ||
| + | * [[https:// | ||
| + | * **Linearized Attention** | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | * Random Feature Attention: [[https:// | ||
| * **[[https:// | * **[[https:// | ||
| + | * Early related work: [[https:// | ||
| * Single-Headed Gated Attention (SHGA): [[https:// | * Single-Headed Gated Attention (SHGA): [[https:// | ||
| + | * **Sparse Attention** | ||
| + | * Longformer | ||
| + | * BigBird | ||
| + | * Hierarchical Attention Transformers (HAT): [[https:// | ||
| + | * **MoE Sparse Attention** | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| ===== Papers ===== | ===== Papers ===== | ||
| Line 25: | Line 41: | ||
| ===== Related Pages ===== | ===== Related Pages ===== | ||
| - | * [[nn_architectures|Neural Network Architectures]] | + | * [[ml:nn_architectures|Neural Network Architectures]] |
| * [[Transformers]] | * [[Transformers]] | ||
nlp/attention_mechanisms.1690740524.txt.gz · Last modified: 2023/07/30 18:08 by jmflanig