NLPExplorer

GenBench - 2024

Total Papers:- 14

Total Papers accross all years:- 33

Total Citations :- 0

1

Towards a new Benchmark for Emotion Detection in NLP: A Unifying Framework of Recent Corpora

Anna Koufakou | Elijah Nieves | John Peller |

Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP

Dieuwke Hupkes | Verna Dankers | Khuyagbaatar Batsuren | Amirhossein Kazemnejad | Christos Christodoulopoulos | Mario Giulianelli | Ryan Cotterell |

MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models

Wentian Wang | Sarthak Jain | Paul Kantor | Jacob Feldman | Lazaros Gallos | Hao Wang |

Beyond the Numbers: Transparency in Relation Extraction Benchmark Creation and Leaderboards

Varvara Arzt | Allan Hanbury |

OmniDialog: A Multimodal Benchmark for Generalization Across Text, Visual, and Audio Modalities

Anton Razzhigaev | Maxim Kurkin | Elizaveta Goncharova | Irina Abdullaeva | Anastasia Lysenko | Alexander Panchenko | Andrey Kuznetsov | Denis Dimitrov |

CHIE: Generative MRC Evaluation for in-context QA with Correctness, Helpfulness, Irrelevancy, and Extraneousness Aspects

Wannaphong Phatthiyaphaibun | Surapon Nonesung | Peerat Limkonchotiwat | Can Udomcharoenchaikit | Jitkapat Sawatphol | Ekapol Chuangsuwanich | Sarana Nutanong |

The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns

Bastian Bunzeck | Sina Zarrieß |

Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification

Kush Dubey |

Automated test generation to evaluate tool-augmented LLMs as conversational AI agents

Samuel Arcadinho | David Oliveira Aparicio | Mariana S. C. Almeida |

MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models

Dojun Park | Jiwoo Lee | Seohyun Park | Hyeyun Jeong | Youngeun Koo | Soonha Hwang | Seonwoo Park | Sungeun Lee |

From Language to Pixels: Task Recognition and Task Learning in LLMs

Janek Falkenstein | Carolin M. Schuster | Alexander H. Berger | Georg Groh |

Investigating the Generalizability of Pretrained Language Models across Multiple Dimensions: A Case Study of NLI and MRC

Ritam Dutt | Sagnik Ray Choudhury | Varun Venkat Rao | Carolyn Rose | V.G.Vinod Vydiswaran |

MLissard: Multilingual Long and Simple Sequential Reasoning Benchmarks

Mirelle Candida Bueno | Roberto Lotufo | Rodrigo Frassetto Nogueira |

Is artificial intelligence still intelligence? LLMs generalize to novel adjective-noun pairs, but don’t mimic the full human distribution

Hayley Ross | Kathryn Davidson | Najoung Kim |

Conference Topic Distribution

Conference Citation Distribution

Conference Papers have no Citations yet

Topics