nlp:language_model

This is an old revision of the document!


Language Models

Traditional definition of a language model (LM): a language model is a probability distribution over sentences, that is, it assigns probabilities to sentences. Language models can usually compute the probability of the next word given a sequence of words (autoregressive language models), or in the case of masked language models, the probability of a word given a surrounding context.

Note: unlike autoregressive language models, masked language models usually can't be used to compute the probability of a sentence, and so they aren't really “language models” in the traditional sense.

To experiment with an autoregressive language model or masked language model, see online demos below.

Overviews

Papers

Large Language Models

See also Ecosystem Graphs for a more complete list.

This is a list of large, GPT-style autoregressive LMs. See also pretraining for another list of large language models and GPT-3 alternatives.

Model Year Parameters Training Data Public? Link
GPT 2018 BooksCorpus Yes github Huggingface
GPT-2 2019 1.5B Webtext (closed, see datasets below) Yes github Huggingface
GPT-3 2020 175B CommonCrawl, Webtext2, Books 1&2, Wikipedia API OpenAI cookbook
MoE 2021 1.1T (13B) CC100, CC-News, CC-Stories, OpenWebText, BookCorpus, Wikipedia Yes github HuggingFace
Gopher 2021 280B MassiveText No blog
Megatron-Turing NLG 2022 530B Pile, CommonCrawl, Realnews, CC-Stories Researcher access blog1 blog2 github
Chinchilla 2022 70B MassiveText No blog
GPT-NeoX-20B 2022 20B Pile Yes github
Jurassic-1 2022 178B API AI21 studio
YaLM 100B 2022 100B Pile + lots of Russian text Yes github HuggingFace
PaLM 2022 540B Social media, web, books, Github, Wikipedia No? blog
OPT 2022 66B, 175B Pile subset: CommonCrawl, OpenWebtext2, Gutenberg, Wikipedia Yes demo models
UL2 2022 20B Yes blog github
Bloom 2022 176B Multilingual BigScienceCorpus paper Yes HuggingFace demo
GLM-130B 2022 130B Pile, Chinese WudaoCorpora, more Yes github
Galactica 2022 120B Scientific papers, code, reference material, prompts Yes github HuggingFace
ChatGPT 2022 ? API demo ShareGPT
LLaMA 2023 65B CommonCrawl, C4, Github, Wikipedia, Books3, ArXiv, StackExchange Yes blog github
GPT-4 2023 ? ? (multi-modal) API website
Alpaca 2023 7B 52k instructions from Self-Instruct w/ text-davinci-003 Yes github demo
Vicuna 2023 7B/13B (Chatbot) Yes github demo
Koala 2023 13B Yes github demo
StackLLaMA 2023 7B Yes demo
LIMA 2023 65B
PaLM 2 2023 14.7B API website api
LLama 2 2023 70B Yes website blog
Mistral 7B, Mixtral 8X7B 2023 7B Yes, API
Orca 2 2023
OLMo 2024 7B dolma Yes, open data blog github huggingface
Gemma 2024 7B, 2B Yes blog
Jamba 2024 52B Yes blog HuggingFace
OpenELM 2024 1.1B Yes

Abilities and Analysis of LLMs

Origin of Capabilities

Evaluation of LLMs and Benchmarks

Questions and Critiques of LLMs

Adapting Language Models

To Domains

To Other Languages

Temporal Language Modeling

Extracting Knowledge from Language Models

Knowledge Editing

Personalization

LLM Personality and Writing Style

Detecting Generated Text

Adversarial Attacks

Steering

Applications

Theoretical and Foundational Papers

Emergent Abilities

Acceleration and Efficiency

Miscellaneous

Concept or Semantic LLMs

Consciousness of LLMs

Historical Papers

Historical papers that may or may not be applicable today.

Datasets

Software and Demos

nlp/language_model.1752819893.txt.gz · Last modified: 2025/07/18 06:24 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki