NLPExplorer

Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Hao Peng | Yunjia Qi | Xiaozhi Wang | Zijun Yao | Bin Xu | Lei Hou | Juanzi Li |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria
Venue: ACL |

Citations

URL

No Citations Yet

Field Of Study