L L M Unlearning

Table of Contents
NeurIPS
ICML
ICLR
KDD
ACL
EMNLP
AAAI
USENIX Security Symposium
COLING
CIKM
Expert Syst. Appl.
Neural Networks
IEEE Trans. Knowl. Data Eng.
Nat. Mac. Intell.
arXiv

NeurIPS

Expand NeurIPS

2024

Title Venue Year Link
Large Language Model Unlearning via Embedding-Corrupted Prompts. NeurIPS 2024 Link
Large Language Model Unlearning. NeurIPS 2024 Link
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024 Link
Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference. NeurIPS 2024 Link
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models. NeurIPS 2024 Link
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space. NeurIPS 2024 Link
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models. NeurIPS 2024 Link

ICML

Expand ICML

2025

Title Venue Year Link
Adaptive Localization of Knowledge Negation for Continual LLM Unlearning. ICML 2025 Link
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning. ICML 2025 Link
Fast Exact Unlearning for In-Context Learning Data for LLMs. ICML 2025 Link
GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs. ICML 2025 Link
Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning. ICML 2025 Link
Tool Unlearning for Tool-Augmented LLMs. ICML 2025 Link
Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond. ICML 2025 Link
Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning. ICML 2025 Link

2024

Title Venue Year Link
In-Context Unlearning: Language Models as Few-Shot Unlearners. ICML 2024 Link
To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models. ICML 2024 Link

ICLR

Expand ICLR

2025

Title Venue Year Link
A Closer Look at Machine Unlearning for Large Language Models. ICLR 2025 Link
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models. ICLR 2025 Link
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset. ICLR 2025 Link
Catastrophic Failure of LLM Unlearning via Quantization. ICLR 2025 Link
LLM Unlearning via Loss Adjustment with Only Forget Data. ICLR 2025 Link
MUSE: Machine Unlearning Six-Way Evaluation for Language Models. ICLR 2025 Link
On Large Language Model Continual Unlearning. ICLR 2025 Link
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond. ICLR 2025 Link
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods. ICLR 2025 Link
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs. ICLR 2025 Link
Unified Parameter-Efficient Unlearning for LLMs. ICLR 2025 Link
Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning. ICLR 2025 Link

KDD

Expand KDD

2025

Title Venue Year Link
LLM-Eraser: Optimizing Large Language Model Unlearning through Selective Pruning. KDD 2025 Link

ACL

Expand ACL

2025

Title Venue Year Link
A General Framework to Enhance Fine-tuning-based LLM Unlearning. ACL 2025 Link
Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning. ACL 2025 Link
Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis. ACL 2025 Link
Decoupling Memories, Muting Neurons: Towards Practical Machine Unlearning for Large Language Models. ACL 2025 Link
Disentangling Biased Knowledge from Reasoning in Large Language Models via Machine Unlearning. ACL 2025 Link
From Evasion to Concealment: Stealthy Knowledge Unlearning for LLMs. ACL 2025 Link
MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Language Models. ACL 2025 Link
Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models. ACL 2025 Link
Opt-Out: Investigating Entity-Level Unlearning for Large Language Models via Optimal Transport. ACL 2025 Link
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space. ACL 2025 Link
ReLearn: Unlearning via Learning for Large Language Models. ACL 2025 Link
Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning. ACL 2025 Link
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs? ACL 2025 Link
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning. ACL 2025 Link
Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training. ACL 2025 Link
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation. ACL 2025 Link
Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation. ACL 2025 Link
Which Retain Set Matters for LLM Unlearning? A Case Study on Entity Unlearning. ACL 2025 Link

2024

Title Venue Year Link
Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning. ACL 2024 Link
Machine Unlearning of Pre-trained Large Language Models. ACL 2024 Link
Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models. ACL 2024 Link
Towards Safer Large Language Models through Machine Unlearning. ACL 2024 Link
Unlearning Traces the Influential Training Data of Language Models. ACL 2024 Link

2023

Title Venue Year Link
Knowledge Unlearning for Mitigating Privacy Risks in Language Models. ACL 2023 Link
Unlearning Bias in Language Models by Partitioning Gradients. ACL 2023 Link

EMNLP

Expand EMNLP

2025

Title Venue Year Link
A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and Optimization. EMNLP 2025 Link
Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models. EMNLP 2025 Link
Mitigating Biases in Language Models via Bias Unlearning. EMNLP 2025 Link
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models. EMNLP 2025 Link
REVIVING YOUR MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing. EMNLP 2025 Link
SEPS: A Separability Measure for Robust Unlearning in LLMs. EMNLP 2025 Link
SUA: Stealthy Multimodal Large Language Model Unlearning Attack. EMNLP 2025 Link

2024

Title Venue Year Link
Can Machine Unlearning Reduce Social Bias in Language Models? EMNLP 2024 Link
Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models. EMNLP 2024 Link
Dissecting Fine-Tuning Unlearning in Large Language Models. EMNLP 2024 Link
EFUF: Efficient Fine-Grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models. EMNLP 2024 Link
Fine-grained Pluggable Gradient Ascent for Knowledge Unlearning in Language Models. EMNLP 2024 Link
SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning. EMNLP 2024 Link
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models. EMNLP 2024 Link
Towards Robust Evaluation of Unlearning in LLMs via Data Transformations. EMNLP 2024 Link
ULMR: Unlearning Large Language Models via Negative Response and Model Parameter Average. EMNLP 2024 Link

2023

Title Venue Year Link
Preserving Privacy Through Dememorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models. EMNLP 2023 Link
Unlearn What You Want to Forget: Efficient Unlearning for LLMs. EMNLP 2023 Link

AAAI

Expand AAAI

2025

Title Venue Year Link
Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models. AAAI 2025 Link
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage. AAAI 2025 Link
On Effects of Steering Latent Representation for Large Language Model Unlearning. AAAI 2025 Link
Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models. AAAI 2025 Link
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models. AAAI 2025 Link

USENIX Security Symposium

Expand USENIX Security Symposium

2025

Title Venue Year Link
Refusal Is Not an Option: Unlearning Safety Alignment of Large Language Models. USENIX Security Symposium 2025 Link

COLING

Expand COLING

2025

Title Venue Year Link
Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models. COLING 2025 Link
Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis. COLING 2025 Link

CIKM

Expand CIKM

2025

Title Venue Year Link
Pseudo-Inverse Prefix Tuning for Effective Unlearning in LLMs. CIKM 2025 Link

Expert Syst. Appl.

Expand Expert Syst. Appl.

2025

Title Venue Year Link
Law LLM unlearning via interfere prompt, review output and update parameter: new challenges, method and baseline. Expert Syst. Appl. 2025 Link

Neural Networks

Expand Neural Networks

2025

Title Venue Year Link
DP2Unlearning: An efficient and guaranteed unlearning framework for LLMs. Neural Networks 2025 Link

IEEE Trans. Knowl. Data Eng.

Expand IEEE Trans. Knowl. Data Eng.

2025

Title Venue Year Link
Exact and Efficient Unlearning for Large Language Model-Based Recommendation. IEEE Trans. Knowl. Data Eng. 2025 Link

Nat. Mac. Intell.

Expand Nat. Mac. Intell.

2025

Title Venue Year Link
Rethinking machine unlearning for large language models. Nat. Mac. Intell. 2025 Link

arXiv

Expand arXiv

2026

Title Venue Year Link
$\textbf{AGT$^{AO}$}$: Robust and Stabilized LLM Unlearning via Adversarial Gating Training with Adaptive Orthogonality arXiv 2026 Link
Auditing Language Model Unlearning via Information Decomposition arXiv 2026 Link
BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning arXiv 2026 Link
Beyond Forgetting: Machine Unlearning Elicits Controllable Side Behaviors and Capabilities arXiv 2026 Link
CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment arXiv 2026 Link
DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher arXiv 2026 Link
FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning arXiv 2026 Link
From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning arXiv 2026 Link
From Logits to Latents: Contrastive Representation Shaping for LLM Unlearning arXiv 2026 Link
Gauss-Newton Unlearning for the LLM Era arXiv 2026 Link
KUDA: Knowledge Unlearning by Deviating Representation for Large Language Models arXiv 2026 Link
Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning arXiv 2026 Link
Per-parameter Task Arithmetic for Unlearning in Large Language Models arXiv 2026 Link
Quantization-Robust LLM Unlearning via Low-Rank Adaptation arXiv 2026 Link
Reinforcement Unlearning via Group Relative Policy Optimization arXiv 2026 Link
STaR: Sensitive Trajectory Regulation for Unlearning in Large Reasoning Models arXiv 2026 Link
Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning arXiv 2026 Link

2025

Title Venue Year Link
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models arXiv 2025 Link
A General Framework to Enhance Fine-tuning-based LLM Unlearning arXiv 2025 Link
A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty arXiv 2025 Link
A Survey on Unlearning in Large Language Models arXiv 2025 Link
A mean teacher algorithm for unlearning of language models arXiv 2025 Link
Agents Are All You Need for LLM Unlearning arXiv 2025 Link
Align-then-Unlearn: Embedding Alignment for LLM Unlearning arXiv 2025 Link
BLUR: A Benchmark for LLM Unlearning Robust to Forget-Retain Overlap arXiv 2025 Link
BLUR: A Bi-Level Optimization Approach for LLM Unlearning arXiv 2025 Link
Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization arXiv 2025 Link
Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis arXiv 2025 Link
Bridging the Gap Between Preference Alignment and Machine Unlearning arXiv 2025 Link
CLUE: Conflict-guided Localization for LLM Unlearning Framework arXiv 2025 Link
Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning arXiv 2025 Link
Concept Unlearning in Large Language Models via Self-Constructed Knowledge Triplets arXiv 2025 Link
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models arXiv 2025 Link
Cyber for AI at SemEval-2025 Task 4: Forgotten but Not Lost: The Balancing Act of Selective Unlearning in Large Language Models arXiv 2025 Link
DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning arXiv 2025 Link
Direct Token Optimization: A Self-contained Approach to Large Language Model Unlearning arXiv 2025 Link
Distillation Robustifies Unlearning arXiv 2025 Link
Distribution Preference Optimization: A Fine-grained Perspective for LLM Unlearning arXiv 2025 Link
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning arXiv 2025 Link
Dual-Space Smoothness for Robust and Balanced LLM Unlearning arXiv 2025 Link
Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning? arXiv 2025 Link
Existing Large Language Model Unlearning Evaluations Are Inconclusive arXiv 2025 Link
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning arXiv 2025 Link
Forgetting to Forget: Attention Sink as A Gateway for Backdooring LLM Unlearning arXiv 2025 Link
Forgetting-MarI: LLM Unlearning via Marginal Information Regularization arXiv 2025 Link
From Learning to Unlearning: Biomedical Security Protection in Multimodal Large Language Models arXiv 2025 Link
GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs arXiv 2025 Link
GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection arXiv 2025 Link
GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models arXiv 2025 Link
Geometric-disentangelment Unlearning arXiv 2025 Link
Holistic Audit Dataset Generation for LLM Unlearning via Knowledge Graph Traversal and Redundancy Removal arXiv 2025 Link
Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning arXiv 2025 Link
Improving LLM Unlearning Robustness via Random Perturbations arXiv 2025 Link
Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning arXiv 2025 Link
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? arXiv 2025 Link
Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy arXiv 2025 Link
LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks arXiv 2025 Link
LLM Unlearning Should Be Form-Independent arXiv 2025 Link
LLM Unlearning Under the Microscope: A Full-Stack View on Methods and Metrics arXiv 2025 Link
LLM Unlearning Without an Expert Curated Dataset arXiv 2025 Link
LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data arXiv 2025 Link
LLM Unlearning using Gradient Ratio-Based Influence Estimation and Noise Injection arXiv 2025 Link
LLM Unlearning via Neural Activation Redirection arXiv 2025 Link
LLM Unlearning with LLM Beliefs arXiv 2025 Link
LUME: LLM Unlearning with Multitask Evaluations arXiv 2025 Link
LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples arXiv 2025 Link
Label Smoothing Improves Gradient Ascent in LLM Unlearning arXiv 2025 Link
Large Language Model Unlearning for Source Code arXiv 2025 Link
Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding arXiv 2025 Link
Not All Data Are Unlearned Equally arXiv 2025 Link
Not Every Token Needs Forgetting: Selective Unlearning to Limit Change in Utility in Large Language Model Unlearning arXiv 2025 Link
Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models arXiv 2025 Link
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics arXiv 2025 Link
RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality arXiv 2025 Link
RapidUn: Influence-Driven Parameter Reweighting for Efficient Large Language Model Unlearning arXiv 2025 Link
Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning arXiv 2025 Link
Reference-Specific Unlearning Metrics Can Hide the Truth: A Reality Check arXiv 2025 Link
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond arXiv 2025 Link
Reveal and Release: Iterative LLM Unlearning with Self-generated Data arXiv 2025 Link
Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing arXiv 2025 Link
Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization arXiv 2025 Link
SUA: Stealthy Multimodal Large Language Model Unlearning Attack arXiv 2025 Link
Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions arXiv 2025 Link
SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models arXiv 2025 Link
SoK: Machine Unlearning for Large Language Models arXiv 2025 Link
Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models arXiv 2025 Link
Standard vs. Modular Sampling: Best Practices for Reliable LLM Unlearning arXiv 2025 Link
Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning arXiv 2025 Link
Towards Evaluation for Real-World LLM Unlearning arXiv 2025 Link
Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond arXiv 2025 Link
Towards Mitigating Excessive Forgetting in LLM Unlearning via Entanglement-Guidance with Proxy Constraint arXiv 2025 Link
UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets arXiv 2025 Link
UniErase: Towards Balanced and Precise Unlearning in Language Models arXiv 2025 Link
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs arXiv 2025 Link
WaterDrum: Watermarking for Data-centric Unlearning Metric arXiv 2025 Link
When Forgetting Builds Reliability: LLM Unlearning for Reliable Hardware Code Generation arXiv 2025 Link
Which Retain Set Matters for LLM Unlearning? A Case Study on Entity Unlearning arXiv 2025 Link
Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting arXiv 2025 Link

2024

Title Venue Year Link
A Closer Look at Machine Unlearning for Large Language Models arXiv 2024 Link
Avoiding Copyright Infringement via Large Language Model Unlearning arXiv 2024 Link
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset arXiv 2024 Link
Catastrophic Failure of LLM Unlearning via Quantization arXiv 2024 Link
Classifier-free guidance in LLMs Safety arXiv 2024 Link
Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods arXiv 2024 Link
How Data Inter-connectivity Shapes LLMs Unlearning: A Structural Unlearning Perspective arXiv 2024 Link
Investigating the Feasibility of Mitigating Potential Copyright Infringement via Large Language Model Unlearning arXiv 2024 Link
LLM Unlearning via Loss Adjustment with Only Forget Data arXiv 2024 Link
Large Language Model Unlearning via Embedding-Corrupted Prompts arXiv 2024 Link
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts arXiv 2024 Link
Methods to Assess the UK Government's Current Role as a Data Provider for AI arXiv 2024 Link
Multi-Objective Large Language Model Unlearning arXiv 2024 Link
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning arXiv 2024 Link
On Effects of Steering Latent Representation for Large Language Model Unlearning arXiv 2024 Link
On Large Language Model Continual Unlearning arXiv 2024 Link
Position: LLM Unlearning Benchmarks are Weak Measures of Progress arXiv 2024 Link
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench arXiv 2024 Link
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models arXiv 2024 Link
Rethinking Machine Unlearning for Large Language Models arXiv 2024 Link
Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference arXiv 2024 Link
Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective arXiv 2024 Link
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs? arXiv 2024 Link
SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning arXiv 2024 Link
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning arXiv 2024 Link
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods arXiv 2024 Link
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models arXiv 2024 Link
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs arXiv 2024 Link
Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation arXiv 2024 Link
Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning arXiv 2024 Link
Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis arXiv 2024 Link
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models arXiv 2024 Link

2023

Title Venue Year Link
Large Language Model Unlearning arXiv 2023 Link