NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Speculating LLMs’ Chinese Training Data Pollution from Their Tokens
Qingjie Zhang
|
Di Wang
|
Haoting Qian
|
Liu Yan
|
Tianwei Zhang
|
Ke Xu
|
Qi Li
|
Minlie Huang
|
Hewu Li
|
Han Qiu
|
Paper Details:
Month: November
Year: 2025
Location: Suzhou, China
Venue:
EMNLP |
Citations
URL
No Citations Yet
https://gist.github.com/ctlllll
https://github.com/openai/tiktoken/issues/297
https://pollutedtokens.github.io
https://github.com/openai/tiktoken/issues/297
https://serpapi.com/
https://commoncrawl.org/
https://www.technologyreview.com/2024/05/17/
https://aclrollingreview.org/responsibleNLPresearch/
https://community.openai.com/t/whats-
https://incidentdatabase.ai/cite/729/
https://en.wikipedia.org/wiki/File:Yui_
https://arxiv
https://serpapi.com/
Field Of Study