NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Investigating Web Corpus Filtering Methods for Language Model Development in Japanese
Rintaro Enomoto
|
Arseny Tolmachev
|
Takuro Niitsuma
|
Shuhei Kurita
|
Daisuke Kawahara
|
Paper Details:
Month: June
Year: 2024
Location: Mexico City, Mexico
Venue:
NAACL |
Citations
URL
No Citations Yet
https://commoncrawl.org/
https://github.com/miso-belica/jusText
https://huggingface.co/datasets/
https://huggingface.co/cyberagent/calm2-7b
https://huggingface.co/rinna/
https://huggingface.co/line-corporation/
https://github.com/HojiChar/HojiChar
https://fasttext.cc/
https://github.com/llm-jp/llm-jp-corpus/
https://scikit-learn.org/stable/
https://taku910.github.io/mecab/
https://www.deepl.com/translator
https://huggingface.co/cl-tohoku/
http://svn.sourceforge.jp/svnroot/slothlib/
https://chat.openai.com/
https://github.com/llm-jp/llm-jp-corpus/
Field Of Study