From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages

Artur Kiulian | Anton Polishko | Mykola Khandoga | Yevhen Kostiuk | Guillermo Gabrielli | Łukasz Gagała | Fadi Zaraket | Qusai Abu Obaida | Hrishikesh Garud | Wendy Wing Yee Mak | Dmytro Chaplynskyi | Selma Amor | Grigol Peradze |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria (online)
Venue: UNLP | WS |