Table of Contents
LSTMs
Bi-directional LSTMS
Prior to Transformer models, Bi-LSTMs with max-pooling were a standard baseline model architecture. From Talman et al 2018 - Sentence Embeddings in NLI with Iterative Refinement Encoders:
Conneau et al. (2017) explore multiple different sentence embedding architectures ranging from LSTM, BiLSTM and intra-attention to convolution neural networks and the performance of these architectures on NLI tasks. They show that, out of these models, BiLSTM with max pooling achieves the strongest results not only in NLI but also in many other NLP tasks requiring sentence level meaning representations. They also show that their model trained on NLI data achieves strong performance on various transfer learning tasks.
With tweaks, they can outperform Transformer models. See Chen et al 2018 - The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation.