NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Investigating How Pre-training Data Leakage Affects Models’ Reproduction and Detection Capabilities
Masahiro Kaneko
|
Timothy Baldwin
|
Paper Details:
Month: November
Year: 2025
Location: Suzhou, China
Venue:
EMNLP |
Citations
URL
No Citations Yet
https://chat.openai.com/
https://claude.ai/chats
https://www.nytimes.com/2023/12/27/business/
https://www.theatlantic.com/
https://huggingface.co/datasets/
https://github.com/togethercomputer/
https://huggingface.co/datasets/EleutherAI/
https://github.com/mosaicml/llm-foundry
https://huggingface.co/datasets/tiiuae/
https://huggingface.co/datasets/allenai/dolma
https://www.washingtonpost.com/technology/
https://spacy.io/usage/linguistic-features
https://books.google.com/
https://news.google.com/
https://scholar.google.com/
https://huggingface.co/datasets
https://openai.com/gpt-4
https://help.openai.com/en/articles/5722486-how-your-
https://github.com/togethercomputer/
Field Of Study