====== Post-Training ====== Post-training refers to the things done to a LLM after pre-training to improve its performance, such as supervised fine-tuning, RLHF, instruction tuning, etc. This is a critical step before releasing the LLM. Typically, this refers to things done by the company or group before releasing the LLM, not the things done afterwards to customize to a specific application. (For an example of this usage, see the [[https://arxiv.org/pdf/2303.08774|GPT-4 technical report]].) ===== Overviews ===== * [[https://arxiv.org/pdf/2502.21321|Kumar et al 2025 - LLM Post-Training: A Deep Dive into Reasoning Large Language Models]] * [[https://arxiv.org/pdf/2503.06072|Tie 2025 - A Survey on Post-training of Large Language Models]] ===== Papers ===== ==== Context-Extension ==== * [[https://arxiv.org/pdf/2410.02660|Gao et al 2024 - How to Train Long-Context Language Models (Effectively)]] [[https://aclanthology.org/2025.acl-long.366.pdf|ACL version]] ===== Sub-Areas ===== * [[Alignment]] * [[Instruction-Tuning]] * [[human-in-the-loop#RLHF]]