NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Matina: A Large-Scale 73B Token Persian Text Corpus
Sara Bourbour Hosseinbeigi
|
Fatemeh Taherinezhad
|
Heshaam Faili
|
Hamed Baghbani
|
Fatemeh Nadi
|
Mostafa Amiri
|
Paper Details:
Month: April
Year: 2025
Location: Albuquerque, New Mexico
Venue:
NAACL |
Citations
URL
No Citations Yet
https://github.com/FTaheriN/Matina-Text-Preprocessing
https://pypi.org/project/langdetect/
https://github.com/google/cld3
https://github.com/Text-Mining/Persian-Wikipedia-
https://github.com/miras-tech/MirasText
https://github.com/ganjoor
https://github.com/christos-c/bible-corpus
https://virgool.io/
https://en.wikishia.net/
https://selenium-python.readthedocs.io/
https://beautiful-soup-4.readthedocs.io/en/
https://github.com/madmaze/pytesseract
https://github.com/Belval/pdf2image
https://github.com/pymupdf/PyMuPDF
Field Of Study