NLPExplorer

Eval4NLP - 2025

Total Papers:- 15

Total Papers accross all years:- 78

Total Citations :- 0

1 2 »

Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation

Naila Shafirni Hidayat | Muhammad Dehan Al Kautsar | Alfan Farizki Wicaksono | Fajri Koto |

Beyond Tokens and Into Minds: Future Directions for Human-Centered Evaluation in Machine Translation Post-Editing

Molly Apsel | Sunil Kothari | Manish Mehta | Vasudevan Sundarababu |

TitleTrap: Probing Presentation Bias in LLM-Based Scientific Reviewing

Shurui Du |

Fair Play in the Newsroom: Actor-Based Filtering Gender Discrimination in Text Corpora

InFiNITE (∞): Indian Financial Narrative Inference Tasks & Evaluations

Sohom Ghosh | Arnab Maji | Sudip Kumar Naskar |

SynClaimEval: A Framework for Evaluating the Utility of Synthetic Data in Long-Context Claim Verification

Mohamed Elaraby | Jyoti Prakash Maheswari |

Reliable Inline Code Documentation with LLMs: Fine-Grained Evaluation of Comment Quality and Coverage

Rohan Patil | Gaurav Tirodkar | Shubham Gatfane |

Beyond the Rubric: Cultural Misalignment in LLM Benchmarks for Sexual and Reproductive Health

Measuring Visual Understanding in Telecom domain: Performance Metrics for Image-to-UML conversion using VLMs

H. G. Ranjani | Rutuja Prabhudesai |

Non-Determinism of “Deterministic” LLM System Settings in Hosted Environments

Proceedings of the 5th Workshop on Evaluation and Comparison of NLP Systems

Evaluation of Generated Poetry

David Mareček | Kateřina Motalík Hodková | Tomáš Musil | Rudolf Rosa |

“The dentist is an involved parent, the bartender is not”: Revealing Implicit Biases in QA with Implicit BBQ

Aarushi Wagh | Saniya Srivastava |

Between the Drafts: An Evaluation Framework for Identifying Quality Improvement and Stylistic Differences in Scientific Texts

Danqing Chen | Ingo Weber | Felix Dietrich |

Test Set Quality in Multilingual LLM Evaluation

Chalamalasetti Kranti | Gabriel Bernier-Colborne | Yvan Gauthier | Sowmya Vajjala |