SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

Tu Anh Dinh | Carlos Mullov | Leonard Bärmann | Zhaolin Li | Danni Liu | Simon Reiß | Jueun Lee | Nathan Lerzer | Jianfeng Gao | Fabian Peller-Konrad | Tobias Röddiger | Alexander Waibel | Tamim Asfour | Michael Beigl | Rainer Stiefelhagen | Carsten Dachsbacher | Klemens Böhm | Jan Niehues |

Paper Details:

Month: November
Year: 2024
Location: Miami, Florida, USA
Venue: EMNLP |