
Towards Safer Large Language Models through Machine Unlearning
Feb 15, 2024 · To address this gap, we introduce Selective Knowledge negation Unlearning (SKU), a novel unlearning framework for LLMs, designed to eliminate harmful knowledge while …
Towards Safer Large Language Models through Machine Unlearning
6 days ago · The rapid advancement of Large Language Models (LLMs) has demonstrated their vast potential across various domains, attributed to their extensive pretraining knowledge and …
Rethinking machine unlearning for large language models
Feb 17, 2025 · We explore machine unlearning in the domain of large language models (LLMs), referred to as LLM unlearning.
Towards Safer Large Language Models through Machine Unlearning
Feb 15, 2024 · To address this gap, we introduce Selective Knowledge negation Unlearning (SKU), a novel unlearning framework for LLMs, designed to eliminate harmful knowledge while …
Towards Safer Large Language Models through Machine Unlearning
Feb 15, 2024 · This work introduces a scalable, automated approach to generate high-quality forget sets using language models themselves, and suggests that synthetic datasets offer a …
In this work, we explore the trade-off between maintaining model utility and unlearning harmful knowledge in Large Language Models (LLMs). To tackle this challenge, we introduce SKU, an …
Large Language Models (LLMs) (Brown et al., 2020; Chowdhery et al., 2023; Touvron et al., 2023; Qin et al., 2023) have demonstrated their exceptional ability across various AI applications
We envision LLM unlearning becoming a pivotal element in the life-cycle management of LLMs, potentially standing as an essential foundation for developing generative artificial intelligence...
[2405.15152] Machine Unlearning in Large Language Models
May 24, 2024 · This paper presents a dual-pronged approach to enhance the ethical and safe behavior of large language models (LLMs) by addressing the issues of harmful responses and …
AFEERASER, a safety unlearning benchmark for MLLMs, consisting of 3,000 images and 28.8K VQA pairs. We comp. e- hensively evaluate unlearning methods from two perspectives: forget …
SafeEraser: Enhancing Safety in Multimodal Large Language Models ...
Feb 18, 2025 · To address this issue, we propose SAFEERASER, a safety unlearning benchmark for MLLMs, consisting of 3,000 images and 28.8K VQA pairs. We comprehensively evaluate …
ACL Anthology
@inproceedings{liu-etal-2024-towards-safer, title = "Towards Safer Large Language Models through Machine Unlearning", author = "Liu, Zheyuan and Dou, Guangyao and Tan, Zhaoxuan …
Towards Safer Large Language Models through Machine Unlearning
In this work, we explore the trade-off between maintaining model utility and unlearning harmful knowledge in Large Language Models (LLMs). To tackle this challenge, we introduce SKU, an …
SafeEraser: Enhancing Safety in Multimodal Large Language Models ...
2 days ago · To quantitatively measure over-forgetting mitigated by PD Loss, we propose a new metric called **Safe Answer Refusal Rate (SARR)**.
Unforgotten Safety: Preserving Safety Alignment of Large Language ...
2 days ago · Abstract The safety alignment of large language models (LLMs) is becoming increasingly important with their democratization. In this paper, we study the safety …