====== Prompting and In-Context Learning ====== ===== Overviews ===== * [[https://arxiv.org/pdf/2107.13586.pdf|Liu et al 2021 - Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing]] * [[https://arxiv.org/pdf/2301.00234.pdf|Dong et al 2022 - A Survey on In-context Learning]] * [[https://arxiv.org/pdf/2212.09597.pdf|Qiao et al 2022 - Reasoning with Language Model Prompting: A Survey]] Very good * [[https://arxiv.org/pdf/2402.07927|Sahoo et al 2024 - A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications]] Not that great * **Tutorials, Courses, Slides and Guides** * Guides * [[https://www.promptingguide.ai/|Prompt Engineering Guide]] This one is pretty good * Slides * UMass Amherst: [[https://people.cs.umass.edu/~miyyer/cs685/slides/prompt_learning.pdf|Prompt-based learning]] * Stanford: [[https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf|Prompting, Instruction Finetuning, and RLHF]] * Blog: [[https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/|Lil'log Prompt Engineering]] * Github: [[https://github.com/brexhq/prompt-engineering|BREX's Prompt Engineering Guide]] * Github: [[https://github.com/dair-ai/Prompt-Engineering-Guide|DAIR AI's Prompt Engineering Guide]] * Course: [[https://learnprompting.org/docs/intro|learnprompting.org]] ===== Prompting Language Models ===== ==== Zero-shot ==== * [[https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf|Radford et al 2019 - Language Models Are Unsupervised Multitask Learners]] [[https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf|old link]] GPT-2 * [[https://arxiv.org/pdf/2109.01652.pdf|Wei et al 2021 - Finetuned Language Models Are Zero-Shot Learners]] * [[https://arxiv.org/pdf/2212.09865.pdf|Lyu et al 2022 - Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations]] ==== Few-shot aka In-Context Learning ==== * [[https://arxiv.org/pdf/2009.07118.pdf|Schick & Schütze 2020 - It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners]] * [[https://arxiv.org/pdf/2012.11926.pdf|Schick & Schütze 2020 - Few-Shot Text Generation with Natural Language Instructions]] GenPET, prompting for natural language generation * **[[https://arxiv.org/pdf/2001.07676.pdf|Schick & Schütze 2021 - Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference]]** Introduces PET, pre-dates GTP-3 * [[https://arxiv.org/pdf/2005.14165.pdf|Brown et al 2021 - Language Models are Few-Shot Learners]] GPT-3 * [[https://arxiv.org/pdf/2012.15723.pdf|Gao et al 2021 - Making Pre-trained Language Models Better Few-shot Learners]] ==== Many-Shot In-Context Learning ==== Prompting with a large context of many shots. * [[https://arxiv.org/pdf/2404.11018|Agarwal et al 2024 - Many-Shot In-Context Learning]] ==== Soft-Prompting, etc ==== * See Soft-prompting overview on p.3 of [[https://aclanthology.org/2021.emnlp-main.672.pdf|Zhao & Schütze 2021]] * [[https://arxiv.org/pdf/2010.15980.pdf|Shin et al 2020 - AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts]] * **Prefix-Tuning (aka P-Tuning)**: [[https://arxiv.org/pdf/2103.10385.pdf|Liu et al 2021 - GPT understands, too]] [[https://aclanthology.org/2021.emnlp-main.672.pdf|Zhao 2021]] finds this method to be the best. * [[https://arxiv.org/pdf/2104.06599.pdf|Qin & Eisner 2021 - Learning How to Ask: Querying LMs with Mixtures of Soft Prompts]] * **Prompt Tuning**: [[https://arxiv.org/pdf/2104.08691.pdf|Lester et al 2021 - The Power of Scale for Parameter-Efficient Prompt Tuning]] Can be seen as a "simplification of the recently proposed “prefix tuning” of Li and Liang (2021)" * [[https://aclanthology.org/2021.emnlp-main.672.pdf|Zhao & Schütze 2021 - Discrete and Soft Prompting for Multilingual Models]] They find that soft prompting with an LSTM like [[https://arxiv.org/pdf/2103.10385.pdf|Liu et al 2021]] is best, both for English and cross-lingually. * [[https://arxiv.org/pdf/2110.07602.pdf|Liu et al 2021 - P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks]] * [[https://arxiv.org/pdf/2111.06719.pdf|Su et al 2021 - On Transferability of Prompt Tuning for Natural Language Processing]] * [[https://arxiv.org/pdf/2112.08348.pdf|Khashabi et al 2021 - Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts]] * [[https://arxiv.org/pdf/2201.08670.pdf|Tang et al 2022 - Context-Tuning: Learning Contextualized Prompts for Natural Language Generation]] * [[https://aclanthology.org/2022.acl-long.346.pdf|Vu et al 2022 - SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer]] - Multi-task, uses a library of learned soft prompts Prompt tuning can be slower than fine-tuning. See the figure below.\\ {{nlp:media:fine-tuning_vs_p-tuning.png?0x150}}\\ Figure from [[https://aclanthology.org/2022.naacl-main.290.pdf|Su et al 2022]]. See also figures 6-8 from [[https://arxiv.org/pdf/2203.06904.pdf|Ding et al 2022]]. ==== Prompt Design / Prompt Engineering ==== See [[Prompt Engineering]]. ==== Calibration and Scoring ==== * [[https://arxiv.org/pdf/2104.08315.pdf|Holtzman et al 2021 - Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right]] * [[https://arxiv.org/pdf/2309.17249.pdf|Zhou et al 2023 - Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering]] ==== Data-Augmentation Prompting ==== * [[https://arxiv.org/pdf/2202.12499.pdf|Wang et al 2022 - PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks]] ==== Chain of Thought Prompting ==== See also [[Reasoning#Reasoning Chains|Reasoning - Reasoning Chains]]. * **Overviews** * [[https://arxiv.org/pdf/2309.15402.pdf|Chu et al 2023 - A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future]] * [[https://arxiv.org/pdf/2401.14295.pdf|Besta et al 2024 - Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts]] * **[[https://arxiv.org/pdf/2201.11903.pdf|Wei et al 2022 - Chain of Thought Prompting Elicits Reasoning in Large Language Models]]** Introduced chain of thought prompting * [[https://arxiv.org/pdf/2205.11916.pdf|Kojima et al 2022 - Large Language Models are Zero-Shot Reasoners]] Introduced the prompt "Let's think step by step." * [[https://arxiv.org/pdf/2203.11171.pdf|Wang et al 2022 - Self-Consistency Improves Chain of Thought Reasoning in Language Models]] Sample multiple chain of thought reasonings, and take the majority vote for the answer * [[https://arxiv.org/pdf/2203.08383.pdf|Wang et al 2022 - Iteratively Prompt Pre-trained Language Models for Chain of Thought]] * [[https://arxiv.org/pdf/2203.14465.pdf|Zelikman et al 2023 - STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning]] * [[https://arxiv.org/pdf/2207.10342.pdf|Dohan et al 2022 - Language Model Cascades]] * [[https://arxiv.org/pdf/2209.07686.pdf|Madaan & Yazdanbakhsh et al 2022 - Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango]] * [[https://arxiv.org/pdf/2210.01240.pdf|Saparov & He 2022 - Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought]] * [[https://arxiv.org/pdf/2210.03629.pdf|Yao et al 2022 - ReAct: Synergizing Reasoning and Acting in Language Models]] - The basis of LangChain * **[[https://arxiv.org/pdf/2211.12588|Chen et al 2022 - Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks]]** * [[https://arxiv.org/pdf/2305.04091.pdf|Wang et 2023 - Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models]] * [[https://arxiv.org/pdf/2305.14992|Hao et al 2023 - Reasoning with Language Model is Planning with World Model]] * **Tree of Thought and Tree Search** * [[https://arxiv.org/pdf/2305.10601.pdf|Yao et al 2023 - Tree of Thoughts: Deliberate Problem Solving with Large Language Models]] * [[https://arxiv.org/pdf/2305.08291.pdf|Long 2023 - Large Language Model Guided Tree-of-Thought]] * [[https://arxiv.org/pdf/2309.17179.pdf|Feng et al 2023 - Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training]] * [[https://arxiv.org/pdf/2404.05966.pdf|Chi et al 2024 - THOUGHTSCULPT: Reasoning with Intermediate Revision and Search]] * [[https://arxiv.org/pdf/2306.14050.pdf|Li et al 2023 - Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step]] * [[https://arxiv.org/abs/2308.05342|Wang & Zhao 2023 - Metacognitive Prompting Improves Understanding in Large Language Models]] * **[[https://arxiv.org/pdf/2310.01714.pdf|Yasunaga et al 2023 - Large Language Models as Analogical Reasoners]]** Adds to the prompt "# Instruction: ## Recall relevant exemplars: ## Solve the initial problem:", which helps more than "Let's think step by step." * [[https://arxiv.org/pdf/2402.10200.pdf|Wang & Zhou et al 2024 - Chain-of-Thought Reasoning Without Prompting]] * [[https://arxiv.org/pdf/2403.02178|Chen et al 2024 - Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models]] Masks the CoT to get better results * [[https://arxiv.org/pdf/2502.15589|Zhang et al 2025 - LightThinker: Thinking Step-by-Step Compression]] * [[https://arxiv.org/pdf/2505.24217|Leng et al 2025 - Semi-structured LLM Reasoners Can Be Rigorously Audited]] William Cohen paper * **Analysis of Chain of Thought** * [[https://arxiv.org/pdf/2310.07923|Merrill & Sabharwal 2024 - The Expressive Power of Transformers with Chain of Thought]] * [[https://arxiv.org/pdf/2502.21212|Huang et al 2025 - Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought]] See related work section for more work ==== Cross-lingual Prompting ==== * [[https://aclanthology.org/2021.emnlp-main.672.pdf|Zhao & Schütze 2021 - Discrete and Soft Prompting for Multilingual Models]] ==== Miscellaneous Promping Papers ==== * [[https://arxiv.org/pdf/2103.08493.pdf|Scao & Rush 2021 - How Many Data Points is a Prompt Worth?]] Prompts are very helpful in small data regimes, and are worth 100's of datapoints. * [[https://arxiv.org/pdf/2112.08348.pdf|Khashabi et al 2021 - Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts]]. See also [[https://arxiv.org/pdf/2109.01247.pdf|Webson & Pavlick 2021]] * [[https://arxiv.org/pdf/2109.01247.pdf|Webson & Pavlick 2021 - Do Prompt-Based Models Really Understand the Meaning of Their Prompts?]] ==== Chained or Tool-based Prompting ==== For an overview see [[https://github.com/thunlp/ToolLearningPapers|Tool Learning Papers]] * **Overviews** * [[https://arxiv.org/pdf/2304.08354.pdf|Qin et al 2023 - Tool Learning with Foundation Models]] * [[https://modelcontextprotocol.io/docs/getting-started/intro|Model Contex Protocol]] A standard introduced by Anthropic in 2024 * [[https://arxiv.org/pdf/2210.03629.pdf|Yao et al 2022 - ReAct: Synergizing Reasoning and Acting in Language Models]]. This kind of thing is implemented in [[https://github.com/hwchase17/langchain|LangChain]] * [[https://arxiv.org/abs/2302.04761|Schick et al 2023 - Toolformer: Language Models Can Teach Themselves to Use Tools]] * [[https://arxiv.org/pdf/2304.08354.pdf|Qin et al 2023 - Tool Learning with Foundation Models]] * [[https://arxiv.org/pdf/2307.16789.pdf|Qin et al 2023 - ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs]] * Uses [[https://rapidapi.com/|RapidAPI]] * [[https://arxiv.org/pdf/2402.01869|2024 - InferCept: Efficient Intercept Support for Augmented Large Language Model Inference]] * [[https://arxiv.org/pdf/2409.00920|Liu et al 2024 - ToolACE: Winning the Points of LLM Function Calling]] ==== Prompt Compression ==== * [[https://arxiv.org/pdf/2304.08467|Mu et al 2024 - Learning to Compress Prompts with Gist Tokens]] ==== Retrieval-Based Methods (Retrieval-Augmented) ==== See [[Retrieval-Augmented Methods]]. ==== Data Contamination Issues ==== See also [[ml: Membership Inference]]. * **Overviews** * [[https://arxiv.org/pdf/2404.00699.pdf|Ravaut et al 2024 - How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library]] * [[https://arxiv.org/pdf/2305.10160.pdf|Jacovi, et al 2023 - Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks]] * [[https://arxiv.org/pdf/2312.16337|Li & Flanigan 2023 - Task Contamination: Language Models May Not Be Few-Shot Anymore]] * LLMSanitize: [[https://arxiv.org/pdf/2404.00699.pdf|Ravaut et al 2024 - How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library]] * [[https://arxiv.org/pdf/2404.18543|Drinkall et al 2024 - Time Machine GPT]] * GSM1k: [[https://arxiv.org/pdf/2405.00332|Zhang et al 2024 - A Careful Examination of Large Language Model Performance on Grade School Arithmetic]] Re-evaluates GSM8K with a new dataset ==== Dependence on Number of Examples ==== * [[https://arxiv.org/pdf/2103.08493|Scao & Rush 2021 - How Many Data Points is a Prompt Worth?]] * [[https://arxiv.org/pdf/2404.11018|Agarwal et al 2024 - Many-Shot In-Context Learning]] ==== Comparison to Fine-Tuning ==== * [[https://arxiv.org/pdf/2305.16938|Mosbach et al 2023 - Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation]] * [[https://arxiv.org/pdf/2401.08406.pdf|Balaguer et al 2024 - RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture]] ==== Analysis of In-Context-Learning ==== * [[https://arxiv.org/pdf/2109.01247.pdf|Webson & Pavlick 2021 - Do Prompt-Based Models Really Understand the Meaning of Their Prompts?]] * [[https://arxiv.org/pdf/2202.12837.pdf|Min et al 2022 - Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?]] * [[https://arxiv.org/pdf/2208.01066.pdf|Garg et al 2022 - What Can Transformers Learn In-Context? A Case Study of Simple Function Classes]] * [[https://arxiv.org/pdf/2211.15661.pdf|Akyürek et al 2022 - What learning algorithm is in-context learning? Investigations with linear models]] * [[https://arxiv.org/pdf/2310.15916.pdf|Hendel et al 2023 - In-Context Learning Creates Task Vectors]] * [[https://arxiv.org/pdf/2505.05145|Hu et al 2025 - Understanding In-context Learning of Addition via Activation Subspaces]] Great paper. Fig 1 is awesome. * [[https://arxiv.org/pdf/2504.00132|Bakalova et al 2025 - Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B]] ===== Datasets ===== * Datasets with Prompts for Evaluating Language Models * **PromptSource**: [[https://github.com/bigscience-workshop/promptsource|github]] [[https://arxiv.org/pdf/2202.01279.pdf|Bach et al 2022 - PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts]] 2,000 prompts for 170 datasets * **BIG-Bench**: [[https://github.com/google/BIG-bench|github]] [[https://arxiv.org/pdf/2206.04615.pdf|Srivastava et al 2022 - Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models]] Growing list of user-submitted tasks. Contains languages other than English * **SuperNatural-Instructuctions**: [[https://arxiv.org/pdf/2204.07705.pdf|Wang et al 2022 - SUPER-NATURALINSTRUCTIONS: Generalization via Declarative Instructions on 1600+ NLP Tasks]] 1,600 instructions for 76 tasks across 55 languages * **BIG-Bench-Hard** * **LM-Evaluation Harness**: [[https://github.com/EleutherAI/lm-evaluation-harness|github]] ===== Software ===== * [[https://github.com/hwchase17/langchain|LangChain]] Framework for building applications with prompting (chaining prompts, etc). This paper was the basis for it: [[https://arxiv.org/pdf/2210.03629.pdf|Yao et al 2022 - ReAct: Synergizing Reasoning and Acting in Language Models]] ===== Talks and Lectures ===== * [[https://underline.io/events/122/sessions?eventSessionId=4313|Invited Talk @ NAACL 2021: Humans Learn From Task Descriptions and So Should Our Models - Hinrich Schütze]] ===== People ===== * [[https://scholar.google.com/citations?user=k8CKy5UAAAAJ&hl=en|Timo Schick]] ===== Related Pages ===== * [[Instruction-Tuning]] * [[ml:Few-Shot Learning]] * [[ml:Fine-Tuning]] * [[Language Model]] * [[ml:Meta-Learning]] * [[Pretraining]] * [[Prompt Engineering]] * [[Retrieval-Augmented Methods]] * [[Task Descriptions|Natural Language Task Descriptions]] * [[ml:Zero-Shot Learning]]