Representation Bending for Large Language Model Safety

Ashkan Yousefpour | Taeheon Kim | Ryan Sungmo Kwon | Seungbeen Lee | Wonje Jeung | Seungju Han | Alvin Wan | Harrison Ngan | Youngjae Yu | Jonghyun Choi |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria
Venue: ACL |