Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring

Honglin Mu | Han He | Yuxin Zhou | Yunlong Feng | Yang Xu | Libo Qin | Xiaoming Shi | Zeming Liu | Xudong Han | Qi Shi | Qingfu Zhu | Wanxiang Che |

Paper Details:

Month: April
Year: 2025
Location: Albuquerque, New Mexico
Venue: NAACL |

Citations

URL

No Citations Yet

No URLs Found

Field Of Study