NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
artofsafety - 2023
Total Papers:- 7
Total Papers accross all years:- 7
Total Citations :- 0
1
Student-Teacher Prompting for Red Teaming to Improve Guardrails
Rodrigo Revilla Llaca |
Victoria Leskoschek |
Vitor Costa Paiva |
Cătălin Lupău |
Philip Lippmann |
Jie Yang |
Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
Aleksander Buszydlik |
Karol Dobiczek |
Michał Teodor Okoń |
Konrad Skublicki |
Philip Lippmann |
Jie Yang |
Measuring Adversarial Datasets
Yuanchen Bai |
Raoyi Huang |
Vijay Viswanathan |
Tzu-Sheng Kuo |
Tongshuang Wu |
Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI
Alicia Parrish |
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
Manuel Brack |
Patrick Schramowski |
Kristian Kersting |
Discovering Safety Issues in Text-to-Image Models: Insights from Adversarial Nibbler Challenge
Gauri Sharma |
Uncovering Bias in AI-Generated Images
Kimberley Baxter |
Conference Topic Distribution
Linguistic
Task
Approach
Language
Dataset
Conference Citation Distribution
Conference Papers have no Citations yet
Topics