Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Tianhao Wu | Weizhe Yuan | Olga Golovneva | Jing Xu | Yuandong Tian | Jiantao Jiao | Jason E Weston | Sainbayar Sukhbaatar |

Paper Details:

Month: November
Year: 2025
Location: Suzhou, China
Venue: EMNLP |

Citations

URL

No Citations Yet