NLPExplorer

GEM - 2025

Total Papers:- 68

Total Papers accross all years:- 164

Total Citations :- 0

« 1 2 3 4 5 »

ReproHum #0729-04: Partial reproduction of the human evaluation of the MemSum and NeuSum summarisation systems

Simon Mille | Michela Lorandi |

Fine-Grained Constraint Generation-Verification for Improved Instruction-Following

Zhixiang Liang | Zhenyu Hou | Xiao Wang |

Does Biomedical Training Lead to Better Medical Performance?

Natural Language Counterfactual Explanations in Financial Text Classification: A Comparison of Generators and Evaluation Metrics

Karol Dobiczek | Patrick Altmeyer | Cynthia C. S. Liem |

The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation

Fine-Tune on the Format: First Improving Multiple-Choice Evaluation for Intermediate LLM Checkpoints

Alec Bunn | Sarah Wiegreffe | Ben Bogin |

Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models

Sherzod Hakimov | Lara Pfennigschmidt | David Schlangen |

Using LLM Judgements for Sanity Checking Results and Reproducibility of Human Evaluations in NLP

Rudali Huidrom | Anya Belz |

Evaluating LLMs with Multiple Problems at once

Zhengxiang Wang | Jordan Kodner | Owen Rambow |

MCQFormatBench: Robustness Tests for Multiple-Choice Questions

Hiroo Takizawa | Saku Sugawara | Akiko Aizawa |

Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation?

Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources

Joachim De Baer | A. Seza Doğruöz | Thomas Demeester | Chris Develder |

Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs

Jasmin Wachter | Michael Radloff | Maja Smolej | Katharina Kinder-Kurlanda |

ReproHum: #0744-02: Investigating the Reproducibility of Semantic Preservation Human Evaluations

Mohammad Arvan | Natalie Parde |