BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

Yi Zeng | Weiyu Sun | Tran Huynh | Dawn Song | Bo Li | Ruoxi Jia |

Paper Details:

Month: November
Year: 2024
Location: Miami, Florida, USA
Venue: EMNLP |