NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Luca Soldaini
|
Rodney Kinney
|
Akshita Bhagia
|
Dustin Schwenk
|
David Atkinson
|
Russell Authur
|
Ben Bogin
|
Khyathi Chandu
|
Jennifer Dumas
|
Yanai Elazar
|
Valentin Hofmann
|
Ananya Jha
|
Sachin Kumar
|
Li Lucy
|
Xinxi Lyu
|
Nathan Lambert
|
Ian Magnusson
|
Jacob Morrison
|
Niklas Muennighoff
|
Aakanksha Naik
|
Crystal Nam
|
Matthew Peters
|
Abhilasha Ravichander
|
Kyle Richardson
|
Zejiang Shen
|
Emma Strubell
|
Nishant Subramani
|
Oyvind Tafjord
|
Evan Walsh
|
Luke Zettlemoyer
|
Noah Smith
|
Hannaneh Hajishirzi
|
Iz Beltagy
|
Dirk Groeneveld
|
Jesse Dodge
|
Kyle Lo
|
Paper Details:
Month: August
Year: 2024
Location: Bangkok, Thailand
Venue:
ACL |
Citations
URL
No Citations Yet
https://www
https://www.regulations
https://github.com/commoncrawl/
https://ssrn.com/abstract=4634513
https://creativecommons.org/
https://www.regulations.gov/
https://github.com/microsoft/
https://huggingface.co/spaces/
https://github
https://stability.ai/news/
https://github.com/
Field Of Study