NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Finding Viable Seed URLs for Web Corpora: A Scouting Approach and Comparative Study of Available Sources
Adrien Barbaresi
|
Paper Details:
Month: April
Year: 2014
Location: Gothenburg, Sweden
Venue:
WAC |
WS |
SIG: SIGWAC
Citations
URL
Efficient construction of metadata-enhanced web corpora
Adrien Barbaresi
|
http://commoncrawl.org/
http://www.etools.ch/
http://www.dmoz.org/
https://github.com/adbar/flux-toolchain
Field Of Study
Task
Language Identification
Machine Translation
Language
Chinese
English
French
Dataset
Encyclopedia
Web Crawl
Similar Papers
One-Shot Neural Cross-Lingual Transfer for Paradigm Completion
Katharina Kann
|
Ryan Cotterell
|
Hinrich Schütze
|
A treebank-based study on the influence of Italian word order on parsing performance
Anita Alicante
|
Cristina Bosco
|
Anna Corazza
|
Alberto Lavelli
|
Integrating Graph-Based and Transition-Based Dependency Parsers
Joakim Nivre
|
Ryan McDonald
|
Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models
Arne Mauser
|
Saša Hasan
|
Hermann Ney
|
Improving Arabic-Chinese Statistical Machine Translation using English as Pivot Language
Nizar Habash
|
Jun Hu
|