nlp:lstm
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| nlp:lstm [2021/02/18 10:31] – created jmflanig | nlp:lstm [2023/06/15 07:36] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== LSTMs ====== | ====== LSTMs ====== | ||
| + | |||
| + | ===== Bi-directional LSTMS ===== | ||
| + | |||
| + | Prior to Transformer models, Bi-LSTMs with max-pooling were a standard baseline model architecture. | ||
| + | From [[https:// | ||
| + | Refinement Encoders]]: | ||
| + | |||
| + | < | ||
| + | [[https:// | ||
| + | ranging from LSTM, BiLSTM and intra-attention to convolution neural networks | ||
| + | and the performance of these architectures on NLI tasks. They show that, out of | ||
| + | these models, BiLSTM with max pooling achieves the strongest results not only | ||
| + | in NLI but also in many other NLP tasks requiring sentence level meaning representations. They also show that their model trained on NLI data achieves strong | ||
| + | performance on various transfer learning tasks. | ||
| + | </ | ||
| + | |||
| + | With tweaks, they can outperform Transformer models. | ||
| ===== Resources ===== | ===== Resources ===== | ||
| * [[https:// | * [[https:// | ||
nlp/lstm.1613644268.txt.gz · Last modified: 2023/06/15 07:36 (external edit)