NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
GenBench - 2024
Total Papers:- 14
Total Papers accross all years:- 33
Total Citations :- 0
1
Towards a new Benchmark for Emotion Detection in NLP: A Unifying Framework of Recent Corpora
Anna Koufakou |
Elijah Nieves |
John Peller |
Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP
Dieuwke Hupkes |
Verna Dankers |
Khuyagbaatar Batsuren |
Amirhossein Kazemnejad |
Christos Christodoulopoulos |
Mario Giulianelli |
Ryan Cotterell |
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models
Wentian Wang |
Sarthak Jain |
Paul Kantor |
Jacob Feldman |
Lazaros Gallos |
Hao Wang |
Beyond the Numbers: Transparency in Relation Extraction Benchmark Creation and Leaderboards
Varvara Arzt |
Allan Hanbury |
OmniDialog: A Multimodal Benchmark for Generalization Across Text, Visual, and Audio Modalities
Anton Razzhigaev |
Maxim Kurkin |
Elizaveta Goncharova |
Irina Abdullaeva |
Anastasia Lysenko |
Alexander Panchenko |
Andrey Kuznetsov |
Denis Dimitrov |
CHIE: Generative MRC Evaluation for in-context QA with Correctness, Helpfulness, Irrelevancy, and Extraneousness Aspects
Wannaphong Phatthiyaphaibun |
Surapon Nonesung |
Peerat Limkonchotiwat |
Can Udomcharoenchaikit |
Jitkapat Sawatphol |
Ekapol Chuangsuwanich |
Sarana Nutanong |
The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns
Bastian Bunzeck |
Sina Zarrieß |
Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification
Kush Dubey |
Automated test generation to evaluate tool-augmented LLMs as conversational AI agents
Samuel Arcadinho |
David Oliveira Aparicio |
Mariana S. C. Almeida |
MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models
Dojun Park |
Jiwoo Lee |
Seohyun Park |
Hyeyun Jeong |
Youngeun Koo |
Soonha Hwang |
Seonwoo Park |
Sungeun Lee |
From Language to Pixels: Task Recognition and Task Learning in LLMs
Janek Falkenstein |
Carolin M. Schuster |
Alexander H. Berger |
Georg Groh |
Investigating the Generalizability of Pretrained Language Models across Multiple Dimensions: A Case Study of NLI and MRC
Ritam Dutt |
Sagnik Ray Choudhury |
Varun Venkat Rao |
Carolyn Rose |
V.G.Vinod Vydiswaran |
MLissard: Multilingual Long and Simple Sequential Reasoning Benchmarks
Mirelle Candida Bueno |
Roberto Lotufo |
Rodrigo Frassetto Nogueira |
Is artificial intelligence still intelligence? LLMs generalize to novel adjective-noun pairs, but don’t mimic the full human distribution
Hayley Ross |
Kathryn Davidson |
Najoung Kim |
Conference Topic Distribution
Linguistic
Task
Approach
Language
Dataset
Conference Citation Distribution
Conference Papers have no Citations yet
Topics