Differences

This shows you the differences between two versions of the page.

--- nlp:lstm [2021/02/18 10:31] – created jmflanig
+++ nlp:lstm [2023/06/15 07:36] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
 ====== LSTMs ======
+===== Bi-directional LSTMS =====
+Prior to Transformer models, Bi-LSTMs with max-pooling were a standard baseline model architecture.
+From [[https://arxiv.org/pdf/1808.08762.pdf|Talman et al 2018 - Sentence Embeddings in NLI with Iterative
+Refinement Encoders]]:
+<blockquote>
+[[https://arxiv.org/pdf/1705.02364.pdf|Conneau et al. (2017)]] explore multiple different sentence embedding architectures
+ranging from LSTM, BiLSTM and intra-attention to convolution neural networks
+and the performance of these architectures on NLI tasks. They show that, out of
+these models, BiLSTM with max pooling achieves the strongest results not only
+in NLI but also in many other NLP tasks requiring sentence level meaning representations. They also show that their model trained on NLI data achieves strong
+performance on various transfer learning tasks.
+</blockquote>
+With tweaks, they can outperform Transformer models.  See [[https://arxiv.org/pdf/1804.09849.pdf|Chen et al 2018 - The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation]].
 ===== Resources =====
   * [[https://people.idsia.ch/~juergen/lstm/|Jeurgen's LSTM Tutorial]]