NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Eval4NLP - 2025
Total Papers:- 15
Total Papers accross all years:- 78
Total Citations :- 0
1
2
»
Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation
Naila Shafirni Hidayat |
Muhammad Dehan Al Kautsar |
Alfan Farizki Wicaksono |
Fajri Koto |
Beyond Tokens and Into Minds: Future Directions for Human-Centered Evaluation in Machine Translation Post-Editing
Molly Apsel |
Sunil Kothari |
Manish Mehta |
Vasudevan Sundarababu |
TitleTrap: Probing Presentation Bias in LLM-Based Scientific Reviewing
Shurui Du |
Fair Play in the Newsroom: Actor-Based Filtering Gender Discrimination in Text Corpora
Stefanie Urchs |
Veronika Thurner |
Matthias Aßenmacher |
Christian Heumann |
Stephanie Thiemichen |
InFiNITE (∞): Indian Financial Narrative Inference Tasks & Evaluations
Sohom Ghosh |
Arnab Maji |
Sudip Kumar Naskar |
SynClaimEval: A Framework for Evaluating the Utility of Synthetic Data in Long-Context Claim Verification
Mohamed Elaraby |
Jyoti Prakash Maheswari |
Reliable Inline Code Documentation with LLMs: Fine-Grained Evaluation of Comment Quality and Coverage
Rohan Patil |
Gaurav Tirodkar |
Shubham Gatfane |
Beyond the Rubric: Cultural Misalignment in LLM Benchmarks for Sexual and Reproductive Health
Sumon Kanti Dey |
Manvi S |
Zeel Mehta |
Meet Shah |
Unnati Agrawal |
Suhani Jalota |
Azra Ismail |
Measuring Visual Understanding in Telecom domain: Performance Metrics for Image-to-UML conversion using VLMs
H. G. Ranjani |
Rutuja Prabhudesai |
Non-Determinism of “Deterministic” LLM System Settings in Hosted Environments
Berk Atıl |
Sarp Aykent |
Alexa Chittams |
Lisheng Fu |
Rebecca J. Passonneau |
Evan Radcliffe |
Guru Rajan Rajagopal |
Adam Sloan |
Tomasz Tudrej |
Ferhan Ture |
Zhe Wu |
Lixinyu Xu |
Breck Baldwin |
Proceedings of the 5th Workshop on Evaluation and Comparison of NLP Systems
Mousumi Akter |
Tahiya Chowdhury |
Steffen Eger |
Christoph Leiter |
Juri Opitz |
Erion Çano |
Evaluation of Generated Poetry
David Mareček |
Kateřina Motalík Hodková |
Tomáš Musil |
Rudolf Rosa |
“The dentist is an involved parent, the bartender is not”: Revealing Implicit Biases in QA with Implicit BBQ
Aarushi Wagh |
Saniya Srivastava |
Between the Drafts: An Evaluation Framework for Identifying Quality Improvement and Stylistic Differences in Scientific Texts
Danqing Chen |
Ingo Weber |
Felix Dietrich |
Test Set Quality in Multilingual LLM Evaluation
Chalamalasetti Kranti |
Gabriel Bernier-Colborne |
Yvan Gauthier |
Sowmya Vajjala |
Conference Topic Distribution
Linguistic
Task
Approach
Language
Dataset
Conference Citation Distribution
Conference Papers have no Citations yet
Topics