The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models

Yanjun Chen | Dawei Zhu | Yirong Sun | Xinghao Chen | Wei Zhang | Xiaoyu Shen |

Paper Details:

Month: November
Year: 2024
Location: Miami, Florida, USA
Venue: EMNLP |