Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region

Chak Tou Leong | Qingyu Yin | Jian Wang | Wenjie Li |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria
Venue: ACL |