This is an old revision of the document!

Prompting and In-Context Learning

Overviews

Liu et al 2021 - Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Dong et al 2022 - A Survey on In-context Learning
Qiao et al 2022 - Reasoning with Language Model Prompting: A Survey Very good
Sahoo et al 2024 - A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications Not that great
Tutorials, Courses, Slides and Guides
- Guides
  - Prompt Engineering Guide This one is pretty good
- Slides
  - UMass Amherst: Prompt-based learning
  - Stanford: Prompting, Instruction Finetuning, and RLHF
- Blog: Lil'log Prompt Engineering
- Github: BREX's Prompt Engineering Guide
- Github: DAIR AI's Prompt Engineering Guide
- Course: learnprompting.org

Prompting Language Models

Zero-shot

Few-shot aka In-Context Learning

Schick & Schütze 2020 - It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Schick & Schütze 2020 - Few-Shot Text Generation with Natural Language Instructions GenPET, prompting for natural language generation
Schick & Schütze 2021 - Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference Introduces PET, pre-dates GTP-3
Brown et al 2021 - Language Models are Few-Shot Learners GPT-3
Gao et al 2021 - Making Pre-trained Language Models Better Few-shot Learners

Many-Shot In-Context Learning

Prompting with a large context of many shots.

Agarwal et al 2024 - Many-Shot In-Context Learning

Soft-Prompting, etc

See Soft-prompting overview on p.3 of Zhao & Schütze 2021
Shin et al 2020 - AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts
Prefix-Tuning (aka P-Tuning): Liu et al 2021 - GPT understands, too Zhao 2021 finds this method to be the best.
Qin & Eisner 2021 - Learning How to Ask: Querying LMs with Mixtures of Soft Prompts
Prompt Tuning: Lester et al 2021 - The Power of Scale for Parameter-Efficient Prompt Tuning Can be seen as a “simplification of the recently proposed “prefix tuning” of Li and Liang (2021)”
Zhao & Schütze 2021 - Discrete and Soft Prompting for Multilingual Models They find that soft prompting with an LSTM like Liu et al 2021 is best, both for English and cross-lingually.
Liu et al 2021 - P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
Su et al 2021 - On Transferability of Prompt Tuning for Natural Language Processing
Khashabi et al 2021 - Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts
Tang et al 2022 - Context-Tuning: Learning Contextualized Prompts for Natural Language Generation
Vu et al 2022 - SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer - Multi-task, uses a library of learned soft prompts

Prompt tuning can be slower than fine-tuning. See the figure below.

Figure from Su et al 2022. See also figures 6-8 from Ding et al 2022.

Prompt Design / Prompt Engineering

See Prompt Engineering.

Calibration and Scoring

Data-Augmentation Prompting

Wang et al 2022 - PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks

Chain of Thought Prompting

Data Contamination Issues

Dependence on Number of Examples

Comparison to Fine-Tuning

Analysis of In-Context-Learning

Datasets

Datasets with Prompts for Evaluating Language Models
- PromptSource: github Bach et al 2022 - PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts 2,000 prompts for 170 datasets
- BIG-Bench: github Srivastava et al 2022 - Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Growing list of user-submitted tasks. Contains languages other than English
- SuperNatural-Instructuctions: Wang et al 2022 - SUPER-NATURALINSTRUCTIONS: Generalization via Declarative Instructions on 1600+ NLP Tasks 1,600 instructions for 76 tasks across 55 languages
- BIG-Bench-Hard
- LM-Evaluation Harness: github

Software

LangChain Framework for building applications with prompting (chaining prompts, etc). This paper was the basis for it: Yao et al 2022 - ReAct: Synergizing Reasoning and Acting in Language Models

Talks and Lectures

Invited Talk @ NAACL 2021: Humans Learn From Task Descriptions and So Should Our Models - Hinrich Schütze

People

Timo Schick

NLP Wiki

Table of Contents