Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Yannis Flet-Berliac | Nathan Grinsztajn | Florian Strub | Eugene Choi | Bill Wu | Chris Cremer | Arash Ahmadian | Yash Chandak | Mohammad Gheshlaghi Azar | Olivier Pietquin | Matthieu Geist |

Paper Details:

Month: November
Year: 2024
Location: Miami, Florida, USA
Venue: EMNLP |

Citations

URL