Shortcut Learning in Safety: The Impact of Keyword Bias in Safeguards

Panuthep Tasawong | Napat Laosaengpha | Wuttikorn Ponwitayarat | Sitiporn Lim | Potsawee Manakul | Samuel Cahyawijaya | Can Udomcharoenchaikit | Peerat Limkonchotiwat | Ekapol Chuangsuwanich | Sarana Nutanong |

Paper Details:

Month: August
Year: 2025
Location: Vienna, Austria
Venue: LLMSEC | WS |
SIG: SIGSEC

Citations

URL