NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
GEM - 2025
Total Papers:- 68
Total Papers accross all years:- 164
Total Citations :- 0
«
1
2
3
4
5
»
PersonaTwin: A Multi-Tier Prompt Conditioning Framework for Generating and Evaluating Personalized Digital Twins
Sihan Chen |
John P. Lalor |
Yi Yang |
Ahmed Abbasi |
Evaluating Intermediate Reasoning of Code-Assisted Large Language Models for Mathematics
Zena Al Khalili |
Nick Howell |
Dietrich Klakow |
Cleanse: Uncertainty Estimation Approach Using Clustering-based Semantic Consistency in LLMs
Minsuh Joo |
Hyunsoo Cho |
The 2025 ReproNLP Shared Task on Reproducibility of Evaluations in NLP: Overview and Results
Anya Belz |
Craig Thomson |
Javier González Corbelle |
Malo Ruelle |
Metric assessment protocol in the context of answer fluctuation on MCQ tasks
Ekaterina Goliakova |
Xavier Renard |
Marie-Jeanne Lesot |
Thibault Laugel |
Christophe Marsala |
Marcin Detyniecki |
Clustering Zero-Shot Uncertainty Estimations to Assess LLM Response Accuracy for Yes/No Q&A
Christopher T. Franck |
Amy Vennos |
W. Graham Mueller |
Daniel Dakota |
ReproHum #0729-04: Human Evaluation Reproduction Report for “MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes”
Simeon Junker |
Are Bias Evaluation Methods Biased ?
Lina Berrayana |
Sean Rooney |
Luis Garcés-Erice |
Ioana Giurgiu |
ARGENT: Automatic Reference-free Evaluation for Open-Ended Text Generation without Source Inputs
Xinyue Zhang |
Agathe Zecevic |
Sebastian Zeki |
Angus Roberts |
ReproHum #0067-01: A Reproduction of the Evaluation of Cross-Lingual Summarization
Supryadi |
Chuang Liu |
Deyi Xiong |
Measure only what is measurable: towards conversation requirements for evaluating task-oriented dialogue systems
Emiel Van Miltenburg |
Anouck Braggaar |
Emmelyn Croes |
Florian Kunneman |
Christine Liebrecht |
Gabriella Martijn |
Selective Shot Learning for Code Explanation
Paheli Bhattacharya |
Rishabh Gupta |
(Dis)improved?! How Simplified Language Affects Large Language Model Performance across Languages
Miriam Anschütz |
Anastasiya Damaratskaya |
Chaeeun Joy Lee |
Arthur Schmalz |
Edoardo Mosca |
Georg Groh |
HuGME: A benchmark system for evaluating Hungarian generative LLMs
Noémi Ligeti-Nagy |
Gabor Madarasz |
Flora Foldesi |
Mariann Lengyel |
Matyas Osvath |
Bence Sarossy |
Kristof Varga |
Győző Zijian Yang |
Enikő Héja |
Tamás Váradi |
Gábor Prószéky |
ReproHum #0669-08: Reproducing Sentiment Transfer Evaluation
Kristýna Onderková |
Mateusz Lango |
Patrícia Schmidtová |
Ondrej Dusek |
Conference Topic Distribution
Linguistic
Task
Approach
Language
Dataset
Conference Citation Distribution
Conference Papers have no Citations yet
Topics