Reinforcement Learning for Large Language Models via Group Preference Reward Shaping

Huaisheng Zhu | Siyuan Xu | Hangfan Zhang | Teng Xiao | Zhimeng Guo | Shijie Zhou | Shuyue Hu | Vasant G. Honavar |

Paper Details:

Month: November
Year: 2025
Location: Suzhou, China
Venue: EMNLP |