Robust Safety Classifier Against Jailbreaking Attacks: Adversarial Prompt Shield

Jinhwa Kim | Ali Derakhshan | Ian Harris |

Paper Details:

Month: June
Year: 2024
Location: Mexico City, Mexico
Venue: WOAH | WS |

Citations

URL