Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models

Mehdi Ali | Manuel Brack | Max Lübbering | Elias Wendt | Abbas Goher Khan | Richard Rutmann | Alex Jude | Maurice Kraus | Alexander Arno Weber | Felix Stollenwerk | David Kaczér | Florian Mai | Lucie Flek | Rafet Sifa | Nicolas Flores-Herr | Joachim Koehler | Patrick Schramowski | Michael Fromm | Kristian Kersting |

Paper Details:

Month: November
Year: 2025
Location: Suzhou, China
Venue: EMNLP |