NLPExplorer
  • Papers
  • Venues
  • Authors
  • Authors Timeline
  • Field of Study
  • URLs
  • ACL N-gram Stats
  • TweeNLP
  • API
  • Team

artofsafety - 2023

Total Papers:- 7
Total Papers accross all years:- 7
Total Citations :- 0
1
Student-Teacher Prompting for Red Teaming to Improve Guardrails
Rodrigo Revilla Llaca | Victoria Leskoschek | Vitor Costa Paiva | Cătălin Lupău | Philip Lippmann | Jie Yang |


Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
Aleksander Buszydlik | Karol Dobiczek | Michał Teodor Okoń | Konrad Skublicki | Philip Lippmann | Jie Yang |


Measuring Adversarial Datasets
Yuanchen Bai | Raoyi Huang | Vijay Viswanathan | Tzu-Sheng Kuo | Tongshuang Wu |


Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI
Alicia Parrish |


Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
Manuel Brack | Patrick Schramowski | Kristian Kersting |


Discovering Safety Issues in Text-to-Image Models: Insights from Adversarial Nibbler Challenge
Gauri Sharma |


Uncovering Bias in AI-Generated Images
Kimberley Baxter |


Conference Topic Distribution

Linguistic Task Approach Language Dataset

Conference Citation Distribution

Conference Papers have no Citations yet

Topics