NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Expectation Preference Optimization: Reliable Preference Estimation for Improving the Reasoning Capability of Large Language Models
Zelin Li
|
Dawei Song
|
Paper Details:
Month: November
Year: 2025
Location: Suzhou, China
Venue:
EMNLP |
Citations
URL
No Citations Yet
https://pytorch.org/
https://huggingface.co/
https://github.com/huggingface/trl
https://github.com/vllm-project/vllm
https://github.com/Vespertinus9/EPO
Field Of Study