NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian
|
Chris Cremer
|
Matthias Gallé
|
Marzieh Fadaee
|
Julia Kreutzer
|
Olivier Pietquin
|
Ahmet Üstün
|
Sara Hooker
|
Paper Details:
Month: August
Year: 2024
Location: Bangkok, Thailand
Venue:
ACL |
Citations
URL
No Citations Yet
https://github.com/
https://github.com/openai/
https://huggingface.co/datasets/Dahoas/
https://github
Field Of Study