Machine Unlearning in Generative AI

Abstract

Machine Unlearning (MU) addresses the critical need to selectively remove specific data or concepts from trained models. This is essential for compliance with regulations like the "Right to be Forgotten", protecting copyright, and ensuring AI safety by removing harmful content.

My Contributions

I actively contributed to an open-source library dedicated to Machine Unlearning in Diffusion Models. My work focused on implementing and optimizing various unlearning algorithms, ensuring they are accessible and effective for the broader community.

Comparative Analysis of MU Methods

ESD (Erasing Concepts)

Method: Fine-tunes model weights to minimize the likelihood of generating specific concepts (e.g., "Van Gogh style" or "nudity") using a negative guidance objective. Pros: Effective for broad concept erasure; permanent weight update. Cons: Can degrade general image quality or lead to "catastrophic interference" if not carefully tuned.

Forget-Me-Not

Method: An attention-steering approach that modifies the cross-attention maps to ignore specific tokens during generation. Pros: Extremely lightweight and fast; ideal for identity erasure (e.g., removing a specific person's likeness). Cons: May be less robust against adversarial prompting compared to weight-based methods.

SalUn (Saliency Unlearning)

Method: Leverages saliency maps to identify and update only the most relevant weights responsible for a specific concept. Pros: High precision; minimizes collateral damage to other concepts; effective for both styles and objects. Cons: Computationally more intensive during the identification phase.

Continual Unlearning Frameworks

Method: Strategies designed to handle a stream of unlearning requests over time. Pros: Solves the "Forgetting to Unlearn" problem where new unlearning updates might undo previous ones. Cons: Complex to implement; requires maintaining a balance between plasticity (learning new forgets) and stability (keeping old forgets and general knowledge).