Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling

Deng Qiyuan | Xuefeng Bai | Kehai Chen | Yaowei Wang | Liqiang Nie | Min Zhang |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria
Venue: ACL |

Citations

URL

No Citations Yet

Field Of Study