NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors
Yu Yan
|
Sheng Sun
|
Zenghao Duan
|
Teli Liu
|
Min Liu
|
Zhiyi Yin
|
LeiJingyu LeiJingyu
|
Qi Li
|
Paper Details:
Month: July
Year: 2025
Location: Vienna, Austria
Venue:
ACL |
Citations
URL
No Citations Yet
https://mp.weixin.qq.com/s/XGBxRVzxSjqoKgOW7aRX9w
https://huggingface.co/Qwen/Qwen2-7B-Instruct
https://huggingface.co/Qwen/Qwen2-72B-Instruct
https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
https://huggingface.co/THUDM/chatglm3-6b
https://huggingface.co/THUDM/glm-4-9b-chat
https://huggingface.co/internlm/internlm2_5-7b-chat
https://huggingface.co/Qwen/Qwen1.5-110B-Chat
https://huggingface.co/meta-llama/Llama-2-13b-chat-hf
https://huggingface.co/meta-llama/Meta-Llama-3-8B
https://huggingface.co/meta-llama/Llama-3.1-8B-
https://huggingface.co/01-ai/Yi-1.5-34B-Chat
https://openai.com/api
https://openai.com/api
https://huggingface.co/jackhhao/jailbreak-classifier
https://chatgpt.com
https://www.volcengine.com
https://gemini.google.com
https://claude.ai
https://github.com/aounon/certified-llm-safety
Field Of Study