NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Reinforcement Learning for Large Language Models via Group Preference Reward Shaping
Huaisheng Zhu
|
Siyuan Xu
|
Hangfan Zhang
|
Teng Xiao
|
Zhimeng Guo
|
Shijie Zhou
|
Shuyue Hu
|
Vasant G. Honavar
|
Paper Details:
Month: November
Year: 2025
Location: Suzhou, China
Venue:
EMNLP |
Citations
URL
No Citations Yet
https://arxiv
https://huggingface.co/datasets/Maxwell-Jia/
https://huggingface.co/datasets/AI-MO/
Field Of Study