Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls

Feiyang Kang | Newsha Ardalani | Michael Kuchnik | Youssef Emad | Mostafa Elhoushi | Shubhabrata Sengupta | Shang-Wen Li | Ramya Raghavendra | Ruoxi Jia | Carole-Jean Wu |

Paper Details:

Month: November
Year: 2025
Location: Suzhou, China
Venue: EMNLP |

Citations

URL

No Citations Yet