NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
The Web as a Parallel Corpus
Philip Resnik
|
Noah A. Smith
|
Paper Details:
Year: 2003
Venue:
CL |
Citations
URL
Multi-level Bootstrapping For Extracting Parallel Sentences From a Quasi-Comparable Corpus
Pascale Fung
|
Percy Cheung
|
Retrieving Bilingual Verb-Noun Collocations by Integrating Cross-Language Category Hierarchies
Fumiyo Fukumoto
|
Yoshimi Suzuki
|
Kazuyuki Yamashita
|
An Empirical Study on Web Mining of Parallel Data
Gumwon Hong
|
Chi-Ho Li
|
Ming Zhou
|
Hae-Chang Rim
|
Large Scale Parallel Document Mining for Machine Translation
Jakob Uszkoreit
|
Jay Ponte
|
Ashok Popat
|
Moshe Dubiner
|
A Novel Method for Bilingual Web Page Acquisition from Search Engine Web Records
Yanhui Feng
|
Yu Hong
|
Zhenxiang Yan
|
Jianmin Yao
|
Qiaoming Zhu
|
Semi Supervised Preposition-Sense Disambiguation using Multilingual Data
Hila Gonen
|
Yoav Goldberg
|
Improving Statistical Machine Translation Performance by Training Data Selection and Optimization
Yajuan Lü
|
Jin Huang
|
Qun Liu
|
Language and Translation Model Adaptation using Comparable Corpora
Matthew Snover
|
Bonnie Dorr
|
Richard Schwartz
|
Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases
Yuval Marton
|
Chris Callison-Burch
|
Philip Resnik
|
Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.
Ashish Venugopal
|
Jakob Uszkoreit
|
David Talbot
|
Franz Och
|
Juri Ganitkevitch
|
Paraphrasing 4 Microblog Normalization
Wang Ling
|
Chris Dyer
|
Alan W Black
|
Isabel Trancoso
|
Unsupervised Induction of Cross-Lingual Semantic Relations
Mike Lewis
|
Mark Steedman
|
On the Use of Comparable Corpora to Improve SMT performance
Sadaf Abdul-Rauf
|
Holger Schwenk
|
Mining Key Phrase Translations from Web Corpora
Fei Huang
|
Ying Zhang
|
Stephan Vogel
|
Harvesting the Bitexts of the Laws of Hong Kong From the Web
Chunyu Kit
|
Xiaoyue Liu
|
KingKui Sin
|
Jonathan J. Webster
|
Mining Chinese-English Parallel Corpora from the Web
Bo Li
|
Juan Liu
|
Finding parallel texts on the web using cross-language information retrieval
Achim Ruopp
|
Fei Xia
|
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Dragos Stefan Munteanu
|
Daniel Marcu
|
Orthographic Errors in Web Pages: Toward Cleaner Web Corpora
Christoph Ringlstetter
|
Klaus U. Schulz
|
Stoyan Mihov
|
Babylon Parallel Text Builder: Gathering Parallel Texts for Low-Density Languages
Michael Mohler
|
Rada Mihalcea
|
TLAXCALA: a multilingual corpus of independent news
Antonio Toral
|
Dual Subtitles as Parallel Corpora
Shikun Zhang
|
Wang Ling
|
Chris Dyer
|
Comparing two acquisition systems for automatically building an English—Croatian parallel corpus from multilingual websites
Miquel Esplà-Gomis
|
Filip Klubička
|
Nikola Ljubešić
|
Sergio Ortiz-Rojas
|
Vassilis Papavassiliou
|
Prokopis Prokopidis
|
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
Liang Tian
|
Derek F. Wong
|
Lidia S. Chao
|
Paulo Quaresma
|
Francisco Oliveira
|
Yi Lu
|
Shuo Li
|
Yiming Wang
|
Longyue Wang
|
Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor’s Love Affair
Nikola Ljubešić
|
Miquel Esplà-Gomis
|
Antonio Toral
|
Sergio Ortiz Rojas
|
Filip Klubička
|
Axolotl: a Web Accessible Parallel Corpus for Spanish-Nahuatl
Ximena Gutierrez-Vasques
|
Gerardo Sierra
|
Isaac Hernandez Pompa
|
Manual vs Automatic Bitext Extraction
Aibek Makazhanov
|
Bagdat Myrzakhmetov
|
Zhenisbek Assylbekov
|
Improved Statistical Machine Translation Using Paraphrases
Chris Callison-Burch
|
Philipp Koehn
|
Miles Osborne
|
Selecting relevant text subsets from web-data for building topic specific language models
Abhinav Sethy
|
Panayiotis Georgiou
|
Shrikanth Narayanan
|
An Integrated Approach to Measuring Semantic Similarity between Words Using Information Available on the Web
Danushka Bollegala
|
Yutaka Matsuo
|
Mitsuru Ishizuka
|
A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation
Masao Utiyama
|
Hitoshi Isahara
|
A Fast Method for Parallel Document Identification
Jessica Enright
|
Grzegorz Kondrak
|
A Simple Sentence-Level Extraction Algorithm for Comparable Data
Christoph Tillmann
|
Jian-ming Xu
|
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
Jason R. Smith
|
Chris Quirk
|
Kristina Toutanova
|
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling
Ferhan Ture
|
Jimmy Lin
|
Multilingual Open Relation Extraction Using Cross-lingual Projection
Manaal Faruqui
|
Shankar Kumar
|
Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation
Akiva Miura
|
Graham Neubig
|
Michael Paul
|
Satoshi Nakamura
|
Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora
Sree Harsha Ramesh
|
Krishna Prasad Sankaranarayanan
|
Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora
Chris Callison-Burch
|
David Talbot
|
Miles Osborne
|
An Automatic Filter for Non-Parallel Texts
Chris Pike
|
I. Dan Melamed
|
The Linguist’s Search Engine: An Overview
Philip Resnik
|
Aaron Elkiss
|
Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora
Dragos Stefan Munteanu
|
Daniel Marcu
|
A DOM Tree Alignment Model for Mining Parallel Data from the Web
Lei Shi
|
Cheng Niu
|
Ming Zhou
|
Jianfeng Gao
|
Novel Association Measures Using Web Search with Double Checking
Hsin-Hsi Chen
|
Ming-Shun Lin
|
Yu-Chuan Wei
|
Is It Correct? – Towards Web-Based Evaluation of Automatic Natural Language Phrase Generation
Calkin S. Montero
|
Kenji Araki
|
Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora
Trevor Cohn
|
Mirella Lapata
|
Multilingual Harvesting of Cross-Cultural Stereotypes
Tony Veale
|
Yanfen Hao
|
Guofu Li
|
Mining Bilingual Data from the Web with Adaptively Learnt Patterns
Long Jiang
|
Shiquan Yang
|
Ming Zhou
|
Xiaohua Liu
|
Qingsheng Zhu
|
A Beam-Search Extraction Algorithm for Comparable Data
Christoph Tillmann
|
Crowdsourcing Translation: Professional Quality from Non-Professionals
Omar F. Zaidan
|
Chris Callison-Burch
|
Microblogs as Parallel Corpora
Wang Ling
|
Guang Xiang
|
Chris Dyer
|
Alan Black
|
Isabel Trancoso
|
Dirt Cheap Web-Scale Parallel Text from the Common Crawl
Jason R. Smith
|
Herve Saint-Amand
|
Magdalena Plamada
|
Philipp Koehn
|
Chris Callison-Burch
|
Adam Lopez
|
Bilingual Data Cleaning for SMT using Graph-based Random Walk
Lei Cui
|
Dongdong Zhang
|
Shujie Liu
|
Mu Li
|
Ming Zhou
|
Are Two Heads Better than One? Crowdsourced Translation via a Two-Step Collaboration of Non-Professional Translators and Editors
Rui Yan
|
Mingkun Gao
|
Ellie Pavlick
|
Chris Callison-Burch
|
Automatic Detection of Multilingual Dictionaries on the Web
Gintarė Grigonytė
|
Timothy Baldwin
|
An Unsupervised Method for Automatic Translation Memory Cleaning
Masoud Jalili Sabet
|
Matteo Negri
|
Marco Turchi
|
Eduard Barbu
|
Filtering and Mining Parallel Data in a Joint Multilingual Space
Holger Schwenk
|
Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E
Pascale Fung
|
Percy Cheung
|
Improving Word Alignment Models using Structured Monolingual Corpora
Wei Wang
|
Ming Zhou
|
Automatically Learning Qualia Structures from the Web
Philipp Cimiano
|
Johanna Wenderoth
|
Frontiers in Linguistic Annotation for Lower-Density Languages
Mike Maxwell
|
Baden Hughes
|
A Fast and Accurate Method for Detecting English-Japanese Parallel Texts
Ken’ichi Fukushima
|
Kenjiro Taura
|
Takashi Chikayama
|
Text data acquisition for domain-specific language models
Abhinav Sethy
|
Panayiotis G. Georgiou
|
Shrikanth Narayanan
|
Extracting Parallel Fragments from Comparable Corpora for Data-to-text Generation
Anja Belz
|
Eric Kow
|
Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora
Emmanuel Morin
|
Emmanuel Prochasson
|
Two Ways to Use a Noisy Parallel News Corpus for Improving Statistical Machine Translation
Souhir Gahbiche-Braham
|
Hélène Bonneau-Maynard
|
François Yvon
|
Extracting Parallel Phrases from Comparable Data
Sanjika Hewavitharana
|
Stephan Vogel
|
Active Learning with Multiple Annotations for Comparable Data Classification Task
Vamshi Ambati
|
Sanjika Hewavitharana
|
Stephan Vogel
|
Jaime Carbonell
|
Identifying Parallel Documents from a Large Bilingual Collection of Texts: Application to Parallel Article Extraction in Wikipedia.
Alexandre Patry
|
Philippe Langlais
|
Unsupervised Alignment of Comparable Data and Text Resources
Anja Belz
|
Eric Kow
|
Building a Web-Based Parallel Corpus and Filtering Out Machine-Translated Text
Alexandra Antonova
|
Alexey Misyurev
|
A Minimally Supervised Approach for Detecting and Ranking Document Translation Pairs
Kriste Krstovski
|
David A. Smith
|
CEU-UPV English-Spanish system for WMT11
Francisco Zamora-Martínez
|
Maria Jose Castro-Bleda
|
Evaluating (and Improving) Sentence Alignment under Noisy Conditions
Omar Zaidan
|
Vishal Chowdhary
|
A modular open-source focused crawler for mining monolingual and bilingual corpora from the web
Vassilis Papavassiliou
|
Prokopis Prokopidis
|
Gregor Thurmair
|
Finding More Bilingual Webpages with High Credibility via Link Analysis
Chengzhi Zhang
|
Xuchen Yao
|
Chunyu Kit
|
Improving the precision of automatically constructed human-oriented translation dictionaries
Alexandra Antonova
|
Alexey Misyurev
|
Domain Adaptation for Medical Text Translation using Web Resources
Yi Lu
|
Longyue Wang
|
Derek F. Wong
|
Lidia S. Chao
|
Yiming Wang
|
Findings of the WMT 2016 Bilingual Document Alignment Shared Task
Christian Buck
|
Philipp Koehn
|
DOCAL - Vicomtech’s Participation in the WMT16 Shared Task on Bilingual Document Alignment
Andoni Azpeitia
|
Thierry Etchegoyhen
|
Bitextor’s participation in WMT’16: shared task on document alignment
Miquel Esplà-Gomis
|
Mikel Forcada
|
Sergio Ortiz-Rojas
|
Jorge Ferrández-Tordera
|
Bilingual Document Alignment with Latent Semantic Indexing
Ulrich Germann
|
Word Clustering Approach to Bilingual Document Alignment (WMT 2016 Shared Task)
Vadim Shchukin
|
Dmitry Khristich
|
Irina Galinskaya
|
A Portable Method for Parallel and Comparable Document Alignment
Thierry Etchegoyhen
|
Andoni Azpeitia
|
Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research
Phillippe Langlais
|
BUCC 2017 Shared Task: a First Attempt Toward a Deep Learning Framework for Identifying Parallel Sentences in Comparable Corpora
Francis Grégoire
|
Philippe Langlais
|
MultiNews: A Web collection of an Aligned Multimodal and Multilingual Corpus
Haithem Afli
|
Pintu Lohar
|
Andy Way
|
If Sentences Could See: Investigating Visual Information for Semantic Textual Similarity
Goran Glavaš
|
Ivan Vulić
|
Simone Paolo Ponzetto
|
Effective Parallel Corpus Mining using Bilingual Sentence Embeddings
Mandy Guo
|
Qinlan Shen
|
Yinfei Yang
|
Heming Ge
|
Daniel Cer
|
Gustavo Hernandez Abrego
|
Keith Stevens
|
Noah Constant
|
Yun-Hsuan Sung
|
Brian Strope
|
Ray Kurzweil
|
Alibaba Submission to the WMT18 Parallel Corpus Filtering Task
Jun Lu
|
Xiaoyu Lv
|
Yangbin Shi
|
Boxing Chen
|
http://www.ldc.upenn.edu
http://www.av.com
http://mysite.com/english/home
http://mysite.com/big5/home
http://umiacs.umd.edu/resnik/strand/
http://babelfish.altavista.com
http://www.rulequest.com/
http://www.foo.ca/english-index.html
http://www.foo.ca/french-index.html
http://www.freedict.com
http://www.archive.org/web/researcher/
http://www.cs.columbia.edu/acl/home.html
http://web.archive.org/web/19970607032410/http://www.cs.columbia.edu/acl/home.html
http://www.ldc.upenn.edu
http://umiacs.umd.edu/resnik/strand/;
http://umiacs.umd.edu/resnik/strand/
http://www.google.com/programming-contest/
http://www.ted.cmis.csiro.au/TRECWeb/access
http://umiacs.umd.edu/resnik/strand
http://www.ercim.org/publication/ws-
http://umiacs.umd.edu/resnik/pubs/
http://nlp.cs.jhu.edu/nasmith/cmsc-
Field Of Study
Task
Language Identification
Morphological Analysis
Tagging
Word Sense Disambiguation
Information Extraction
Information Retrieval
Knowledge Acquisition
Machine Translation
Language
Multilingual
Chinese
English
Spanish
French
Arabic
Semitic
Similar Papers
Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data
Mona Diab
|
Mahmoud Ghoneim
|
Abdelati Hawwari
|
Fahad AlGhamdi
|
Nada AlMarwani
|
Mohamed Al-Badrashiny
|
Enriching Word Vectors with Subword Information
Piotr Bojanowski
|
Edouard Grave
|
Armand Joulin
|
Tomas Mikolov
|
Knowledge-Rich Morphological Priors for Bayesian Language Models
Victor Chahuneau
|
Noah A. Smith
|
Chris Dyer
|
Semi-supervised Structured Prediction with Neural CRF Autoencoder
Xiao Zhang
|
Yong Jiang
|
Hao Peng
|
Kewei Tu
|
Dan Goldwasser
|
A Survey of Arabic Named Entity Recognition and Classification
Khaled Shaalan
|