NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages
Mohammed Khan
|
Priyam Mehta
|
Ananth Sankar
|
Umashankar Kumaravelan
|
Sumanth Doddapaneni
|
Suriyaprasaad B
|
Varun G
|
Sparsh Jain
|
Anoop Kunchukuttan
|
Pratyush Kumar
|
Raj Dabre
|
Mitesh Khapra
|
Paper Details:
Month: August
Year: 2024
Location: Bangkok, Thailand
Venue:
ACL |
Citations
URL
No Citations Yet
https://sharegpt.com/
https://github.com/google/cld3/
https://github.com/shuyo/language-detection
https://sharegpt.com/
https://github.com/AI4Bharat/webcorpus
https://www.opensubtitles.org/
https://nptel.ac.in/translation
https://catalog.ngc.nvidia.com/orgs/nvidia/
https://github.com/ChenghaoMou/text-dedup
https://huggingface.co/datasets/
https://opensource.org/licenses/MIT
https://github.com/
https://github.com/anoopkunchukuttan/
https://igod.gov.in/
https://archive.org/developers/
https://cloud.google.com/vision/docs/
https://github.com/wiseman/py-webrtcvad
https://www.opensubtitles.org/
https://nptel.ac.in/
https://github.com/google/cld3
https://github.com/ChenghaoMou/text-dedup/
https://openai.com/policies/terms-of-use
Field Of Study