nlp:attention_mechanisms
This is an old revision of the document!
Table of Contents
Attention Mechanisms
Overviews
Summary of Attention Mechanisms
Key Papers
- Graves 2013 - Generating Sequences With Recurrent Neural Networks Uses an alignment mechanism for handwriting generation, similar to the attention mechanism. The Deep Learning Book p. 415 at the end of Ch 10 says “The idea of attention mechanisms for neural networks was introduced even earlier, in the context of handwriting generation (Graves, 2013), with an attention mechanism that was constrained to move only forward in time through the sequence.”
- Bahdanau et al 2014 - Neural Machine Translation by Jointly Learning to Align and Translate The paper that started it all. Introduced the attention mechanism and initiated the deep learning revolution in NLP. Basically, the paper that got neural machine translation to actually work.
- Luong et al 2015 - Effective Approaches to Attention-based Neural Machine Translation This paper introduced dot product attention.
- Cheng et al 2016 - Long Short-Term Memory-Networks for Machine Reading This paper introduced self-attention, called intra-attention (introduced in the Long Short-Term Memory-Networks, LSTMNs, section 3.2). See Fig 1 for a picture.
- Parikh et al 2016 - A Decomposable Attention Model for Natural Language Inference This paper, uses intra-attention from Cheng 2016, and according to the Transformer paper, was the inspiration of self-attention in the Transformer.
- Lin et al 2017 - A Structured Self-Attentive Sentence Embedding Introduces the term “self-attention,” which they say is slightly different than Cheng et al's intra-attention.
- Single-Headed Gated Attention (SHGA): Ma et al 2022 - Mega: Moving Average Equipped Gated Attention Shows that single-headed gated attention can simulate multi-head attention, and is more expressive (see section 3.3 and Theorem 1).
Papers
Related Pages
nlp/attention_mechanisms.1690740524.txt.gz · Last modified: 2023/07/30 18:08 by jmflanig
