NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Representation Bending for Large Language Model Safety
Ashkan Yousefpour
|
Taeheon Kim
|
Ryan Sungmo Kwon
|
Seungbeen Lee
|
Wonje Jeung
|
Seungju Han
|
Alvin Wan
|
Harrison Ngan
|
Youngjae Yu
|
Jonghyun Choi
|
Paper Details:
Month: July
Year: 2025
Location: Vienna, Austria
Venue:
ACL |
Citations
URL
No Citations Yet
https://huggingface.co/cais/zephyr_7b_r2d2
https://huggingface.co/GraySwanAI/Llama-3-8B-
https://huggingface.co/GraySwanAI/Mistral-7B-
https://arxiv
https://huggingface.co/maywell/PiVoT-0
https://github.com/
https://github.com/allenai/
Field Of Study