AGD: Adversarial Game Defense Against Jailbreak Attacks in Large Language Models

Shilong Pan | Zhiliang Tian | Zhen Huang | Wanlong Yu | Zhihua Wen | Xinwang Liu | Kai Lu | Minlie Huang | Dongsheng Li |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria
Venue: ACL |