NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
GEM - 2025
Total Papers:- 68
Total Papers accross all years:- 164
Total Citations :- 0
«
1
2
3
4
5
»
ReproHum #0729-04: Partial reproduction of the human evaluation of the MemSum and NeuSum summarisation systems
Simon Mille |
Michela Lorandi |
Fine-Grained Constraint Generation-Verification for Improved Instruction-Following
Zhixiang Liang |
Zhenyu Hou |
Xiao Wang |
Does Biomedical Training Lead to Better Medical Performance?
Amin Dada |
Osman Alperen Koraş |
Marie Bauer |
Jean-Philippe Corbeil |
Amanda Butler Contreras |
Constantin Marc Seibold |
Kaleb E Smith |
Julian Friedrich |
Jens Kleesiek |
Natural Language Counterfactual Explanations in Financial Text Classification: A Comparison of Generators and Evaluation Metrics
Karol Dobiczek |
Patrick Altmeyer |
Cynthia C. S. Liem |
The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation
Samee Arif |
Sualeha Farid |
Abdul Hameed Azeemi |
Awais Athar |
Agha Ali Raza |
Fine-Tune on the Format: First Improving Multiple-Choice Evaluation for Intermediate LLM Checkpoints
Alec Bunn |
Sarah Wiegreffe |
Ben Bogin |
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models
Sherzod Hakimov |
Lara Pfennigschmidt |
David Schlangen |
Using LLM Judgements for Sanity Checking Results and Reproducibility of Human Evaluations in NLP
Rudali Huidrom |
Anya Belz |
Evaluating LLMs with Multiple Problems at once
Zhengxiang Wang |
Jordan Kodner |
Owen Rambow |
MCQFormatBench: Robustness Tests for Multiple-Choice Questions
Hiroo Takizawa |
Saku Sugawara |
Akiko Aizawa |
Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation?
Evangelia Gogoulou |
Shorouq Zahra |
Liane Guillou |
Luise Dürlich |
Joakim Nivre |
Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources
Joachim De Baer |
A. Seza Doğruöz |
Thomas Demeester |
Chris Develder |
Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs
Jasmin Wachter |
Michael Radloff |
Maja Smolej |
Katharina Kinder-Kurlanda |
ReproHum: #0744-02: Investigating the Reproducibility of Semantic Preservation Human Evaluations
Mohammad Arvan |
Natalie Parde |
ReproHum #0744-02: A Reproduction of the Human Evaluation of Meaning Preservation in “Factorising Meaning and Form for Intent-Preserving Paraphrasing”
Julius Steen |
Katja Markert |
Conference Topic Distribution
Linguistic
Task
Approach
Language
Dataset
Conference Citation Distribution
Conference Papers have no Citations yet
Topics