User Tools

Site Tools


nlp:language_model

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:language_model [2025/10/08 05:57] – [Overviews] jmflanignlp:language_model [2026/03/07 22:21] (current) – [Extracting Knowledge from Language Models] jmflanig
Line 131: Line 131:
   * **Overviews**   * **Overviews**
     * [[https://arxiv.org/pdf/2307.03109|Chang et al 2023 - A Survey on Evaluation of Large Language Models]]     * [[https://arxiv.org/pdf/2307.03109|Chang et al 2023 - A Survey on Evaluation of Large Language Models]]
 +    * For common evaluation datasets for LLMs, see recent LLM system description papers such as the [[https://arxiv.org/pdf/2407.21783|LLama 3 paper]] (table 2) or [[https://www.anthropic.com/news/claude-sonnet-4-5|Claude Sonnet 4.5]] (evaluation table).
   * lm-evaluation-harness: [[https://github.com/EleutherAI/lm-evaluation-harness|LM Evaluation Harness (EleutherAI)]] (Released May 2021)   * lm-evaluation-harness: [[https://github.com/EleutherAI/lm-evaluation-harness|LM Evaluation Harness (EleutherAI)]] (Released May 2021)
   * [[https://arxiv.org/pdf/2401.00595|Mizrahi et al 2024 - State of What Art? A Call for Multi-Prompt LLM Evaluation]]   * [[https://arxiv.org/pdf/2401.00595|Mizrahi et al 2024 - State of What Art? A Call for Multi-Prompt LLM Evaluation]]
Line 138: Line 139:
   * **Effects of Length and Irrelevant Context**   * **Effects of Length and Irrelevant Context**
     * [[https://arxiv.org/pdf/2402.14848|Levy et al 2024 - Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models]]     * [[https://arxiv.org/pdf/2402.14848|Levy et al 2024 - Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models]]
 +
 +===== Tool-Use in LLMs =====
 +See also [[prompting#Chained or Tool-based Prompting]].
 +  * **Overviews and Background**
 +    * [[https://modelcontextprotocol.io/docs/getting-started/intro|Model Contex Protocol]]
 +
 +===== Retrieval-Augmented Generation (RAG) =====
 +See [[Retrieval-Augmented Methods]].
  
 ===== Limitations of Current LLMs ===== ===== Limitations of Current LLMs =====
Line 172: Line 181:
   * Extracting Training Data   * Extracting Training Data
     * [[https://arxiv.org/pdf/2012.07805.pdf|Carlini et al 2020 - Extracting Training Data from Large Language Models]] [[https://github.com/ftramer/LM_Memorization|github]]     * [[https://arxiv.org/pdf/2012.07805.pdf|Carlini et al 2020 - Extracting Training Data from Large Language Models]] [[https://github.com/ftramer/LM_Memorization|github]]
 +    * [[https://arxiv.org/pdf/2601.02671|Ahmed et al 2026 - Extracting Books from Production Language Models]]
   * Membership Inference for Training Data   * Membership Inference for Training Data
     * (Decide if some sample data is in the training data or not)     * (Decide if some sample data is in the training data or not)
nlp/language_model.1759903060.txt.gz · Last modified: 2025/10/08 05:57 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki