LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Anna Bavaresco | Raffaella Bernardi | Leonardo Bertolazzi | Desmond Elliott | Raquel Fernández | Albert Gatt | Esam Ghaleb | Mario Giulianelli | Michael Hanna | Alexander Koller | Andre Martins | Philipp Mondorf | Vera Neplenbroek | Sandro Pezzelle | Barbara Plank | David Schlangen | Alessandro Suglia | Aditya K Surikuchi | Ece Takmaz | Alberto Testoni |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria
Venue: ACL |

Citations

URL