NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Robust Safety Classifier Against Jailbreaking Attacks: Adversarial Prompt Shield
Jinhwa Kim
|
Ali Derakhshan
|
Ian Harris
|
Paper Details:
Month: June
Year: 2024
Location: Mexico City, Mexico
Venue:
WOAH |
WS |
Citations
URL
No Citations Yet
https://github.com/jinhwak11/
https://huggingface.co/models
Field Of Study