Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations

Rima Hazra | Sayan Layek | Somnath Banerjee | Soujanya Poria |

Paper Details:

Month: November
Year: 2024
Location: Miami, Florida, USA
Venue: EMNLP |