This is an old revision of the document!

Model Editing and Unlearning

Model editing is where a model, such as a large language model, is “edited” to change the facts in the model. Machine unlearning is where a trained model is adjusted to “remove” one or more datapoints (or classes of datapoints, such as all datapoints about bioweapons) that were used to train the model, so that it behaves like a model that was trained without those datapoints.

Model Editing

In NLP

Machine Unlearning

Overviews

Key Papers

Bourtoule et al 2019 - Machine Unlearning

In NLP or LLMs

Theory Papers

Guo et al 2024 - Certified Data Removal from Machine Learning Models

NLP Wiki

Table of Contents