NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
Mehdi Ali
|
Manuel Brack
|
Max Lübbering
|
Elias Wendt
|
Abbas Goher Khan
|
Richard Rutmann
|
Alex Jude
|
Maurice Kraus
|
Alexander Arno Weber
|
Felix Stollenwerk
|
David Kaczér
|
Florian Mai
|
Lucie Flek
|
Rafet Sifa
|
Nicolas Flores-Herr
|
Joachim Koehler
|
Patrick Schramowski
|
Michael Fromm
|
Kristian Kersting
|
Paper Details:
Month: November
Year: 2025
Location: Suzhou, China
Venue:
EMNLP |
Citations
URL
No Citations Yet
https://huggingface.co/JQL-AI
https://huggingface.co/datasets/JQL-AI/
https://huggingface.co/datasets/JQL-AI/
https://huggingface.co/datasets/HuggingFaceFW/fineweb-2
https://huggingface.co/JQL-AI/
https://github.com/JQL-AI/
https://eurohpc-ju.europa.eu/index_en
https://www.bsc.es/
https://argilla.io/
https://github.com/huggingface/nanotron
https://github.com/huggingface/datatrove
https://github.com/huggingface/lighteval
Field Of Study