NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Optimizing Language Models with Fair and Stable Reward Composition in Reinforcement Learning
Jiahui Li
|
Hanlin Zhang
|
Fengda Zhang
|
Tai-Wei Chang
|
Kun Kuang
|
Long Chen
|
Jun Zhou
|
Paper Details:
Month: November
Year: 2024
Location: Miami, Florida, USA
Venue:
EMNLP |
Citations
URL
No Citations Yet
https://github.com/tedmoskovitz/ConstrainedRL4LMs
https://github.com/allenai/FineGrainedRLHF
https://github.com/PKU-Alignment/safe-rlhf
Field Of Study