NLPExplorer

GEM - 2025

Total Papers:- 68

Total Papers accross all years:- 164

Total Citations :- 0

« 1 2 3 4 5 »

Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons

Isik Baran Sandan | Tu Anh Dinh | Jan Niehues |

Learning and Evaluating Factual Clarification Question Generation Without Examples

Matthew Toles | Yukun Huang | Zhou Yu |

sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting

Coreference as an indicator of context scope in multimodal narrative

Nikolai Ilinykh | Shalom Lappin | Asad B. Sayeed | Sharid Loáiciga |

Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans

Towards Comprehensive Evaluation of Open-Source Language Models: A Multi-Dimensional, User-Driven Approach

Qingchen Yu |

IRSum: One Model to Rule Summarization and Retrieval

Sotaro Takeshita | Simone Paolo Ponzetto | Kai Eckert |

Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data?

Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring

Kezia Oketch | John P. Lalor | Yi Yang | Ahmed Abbasi |

PapersPlease: A Benchmark for Evaluating Motivational Values of Large Language Models Based on ERG Theory

PATCH! Psychometrics-AssisTed BenCHmarking of Large Language Models against Human Populations: A Case Study of Proficiency in 8th Grade Mathematics

Qixiang Fang | Daniel Oberski | Dong Nguyen |

Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents

Ashley Lewis |

Event-based evaluation of abstractive news summarization

Huiling You | Samia Touileb | Lilja Øvrelid | Erik Velldal |

CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization

ELAB: Extensive LLM Alignment Benchmark in Persian Language