NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models
Yanjun Chen
|
Dawei Zhu
|
Yirong Sun
|
Xinghao Chen
|
Wei Zhang
|
Xiaoyu Shen
|
Paper Details:
Month: November
Year: 2024
Location: Miami, Florida, USA
Venue:
EMNLP |
Citations
URL
No Citations Yet
https://huggingface.co/t5-small
https://huggingface.co/t5-base
https://huggingface.co/t5-large
Field Of Study