Improving Discriminative Capability of Reward Models in RLHF Using Contrastive Learning

Lu Chen | Rui Zheng | Binghai Wang | Senjie Jin | Caishuang Huang | Junjie Ye | Zhihao Zhang | Yuhao Zhou | Zhiheng Xi | Tao Gui | Qi Zhang | Xuanjing Huang |

Paper Details:

Month: November
Year: 2024
Location: Miami, Florida, USA
Venue: EMNLP |