NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Chatbot Arena Estimate: towards a generalized performance benchmark for LLM capabilities
Lucas Spangher
|
Tianle Li
|
William F. Arnold
|
Nick Masiewicki
|
Xerxes Dotiwalla
|
Rama Kumar Pasumarthi
|
Peter Grabowski
|
Eugene Ie
|
Daniel Gruhl
|
Paper Details:
Month: April
Year: 2025
Location: Albuquerque, New Mexico
Venue:
NAACL |
Citations
URL
No Citations Yet
https://github.com/sylinrl/TruthfulQA
https://huggingface.co/datasets/edinburgh-dawg/mmlu-redux
https://huggingface.co/datasets/tdiggelm/climate_fever
https://huggingface.co/datasets/allenai/ai2_arc
https://huggingface.co/datasets/boolq
https://huggingface.co/datasets/rajpurkar/squad
https://github.com/google/BIG-bench/tree/main/bigbench
https://huggingface.co/datasets/EdinburghNLP/xsum
https://huggingface.co/datasets/cais/mmlu
https://huggingface.co/datasets/cais/mmlu
Field Of Study