Mutual-Taught for Co-adapting Policy and Reward Models

Tianyuan Shi | Canbin Huang | Fanqi Wan | Longguang Zhong | Ziyi Yang | Weizhou Shen | Xiaojun Quan | Ming Yan |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria
Venue: ACL |