Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis

Yuping Lin | Pengfei He | Han Xu | Yue Xing | Makoto Yamada | Hui Liu | Jiliang Tang |

Paper Details:

Month: November
Year: 2024
Location: Miami, Florida, USA
Venue: EMNLP |