Differences

This shows you the differences between two versions of the page.

--- nlp:alignment [2025/03/20 04:14] – [Blog Posts, etc] jmflanig
+++ nlp:alignment [2025/06/03 00:26] (current) – [People] jmflanig
@@ Line 3: / Line 3: @@
 ===== Overviews =====
   * [[https://arxiv.org/pdf/2309.15025.pdf|Shen et al 2023 - Large Language Model Alignment: A Survey]]
+  * [[https://arxiv.org/pdf/2310.19852|Ji et al 2023 - AI Alignment: A Comprehensive Survey]]
+  * [[https://arxiv.org/pdf/2404.09932|Anwar et al 2024 - Foundational Challenges in Assuring Alignment and Safety of Large Language Models]]
 ===== Blog Posts, etc =====
@@ Line 17: / Line 19: @@
 ===== People =====
+  * [[https://scholar.google.com/citations?user=kV9XRxYAAAAJ&hl=en|Sam Bowman]]
   * [[https://scholar.google.com/citations?user=czyretsAAAAJ&hl=en|Dan Hendrycks]]
@@ Line 23: / Line 26: @@
   * [[Instruction-Tuning]] Instruction-tuning is often similar to alignment, but alignment is broader.  Instruction-tuning methods often falls under alignment.
   * [[Language Model]]
+  * [[LLM Safety]]
+  * [[ml:Mechanistic Interpretability]]