User Tools

Site Tools


nlp:post-training

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:post-training [2025/03/07 09:35] jmflanignlp:post-training [2025/10/07 06:24] (current) jmflanig
Line 1: Line 1:
 ====== Post-Training ====== ====== Post-Training ======
-Post-training refers to the things done to a LLM after pre-training to improve it'performance, such as supervised fine-tuning, RLHF, instruction tuning, etc.  This is a critical step before releasing the LLM.  Typically, this refers to things done by the company or group before releasing the LLM.  (For an example of this usage, see the [[https://arxiv.org/pdf/2303.08774|GPT-4 technical report]].)+Post-training refers to the things done to a LLM after pre-training to improve its performance, such as supervised fine-tuning, RLHF, instruction tuning, etc.  This is a critical step before releasing the LLM.  Typically, this refers to things done by the company or group before releasing the LLM, not the things done afterwards to customize to a specific application.  (For an example of this usage, see the [[https://arxiv.org/pdf/2303.08774|GPT-4 technical report]].)
  
 +===== Overviews =====
 +  * [[https://arxiv.org/pdf/2502.21321|Kumar et al 2025 - LLM Post-Training: A Deep Dive into Reasoning Large Language Models]]
 +  * [[https://arxiv.org/pdf/2503.06072|Tie 2025 - A Survey on Post-training of Large Language Models]]
 +
 +===== Papers =====
 +
 +==== Context-Extension ====
 +  * [[https://arxiv.org/pdf/2410.02660|Gao et al 2024 - How to Train Long-Context Language Models (Effectively)]] [[https://aclanthology.org/2025.acl-long.366.pdf|ACL version]]
 +
 +===== Sub-Areas =====
   * [[Alignment]]   * [[Alignment]]
   * [[Instruction-Tuning]]   * [[Instruction-Tuning]]
   * [[human-in-the-loop#RLHF]]   * [[human-in-the-loop#RLHF]]
  
nlp/post-training.1741340106.txt.gz · Last modified: 2025/03/07 09:35 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki