Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense

Yang Ouyang | Hengrui Gu | Shuhang Lin | Wenyue Hua | Jie Peng | Bhavya Kailkhura | Meijun Gao | Tianlong Chen | Kaixiong Zhou |

Paper Details:

Month: April
Year: 2025
Location: Albuquerque, New Mexico
Venue: NAACL |

Citations

URL

No Citations Yet

Field Of Study