Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Yen-Ting Lin | Di Jin | Tengyu Xu | Tianhao Wu | Sainbayar Sukhbaatar | Chen Zhu | Yun He | Yun-Nung Chen | Jason E Weston | Yuandong Tian | Arash Rahnama | Sinong Wang | Hao Ma | Han Fang |

Paper Details:

Month: November
Year: 2025
Location: Suzhou, China
Venue: MathNLP | WS |