NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
Agam Goyal
|
Vedant Rathi
|
William Yeh
|
Yian Wang
|
Yuen Chen
|
Hari Sundaram
|
Paper Details:
Month: November
Year: 2025
Location: Suzhou, China
Venue:
EMNLP |
Citations
URL
No Citations Yet
https://github.com/CrowdDynamicsLab/SAE-
https://www.alignmentforum
https://openai.com/policies/terms-of-use/
https://github.com/unitaryai/detoxify
https://www.transformer-circuits.pub/
Field Of Study