Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs

Weixiang Zhao | Yulin Hu | Yang Deng | Jiahe Guo | Xingyu Sui | Xinyang Han | An Zhang | Yanyan Zhao | Bing Qin | Tat-Seng Chua | Ting Liu |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria
Venue: ACL |