APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport

Zhuo Li | Yuege Feng | Dandan Guo | Jinpeng Hu | Anningzhe Gao | Xiang Wan |

Paper Details:

Month: November
Year: 2025
Location: Suzhou, China
Venue: EMNLP |