Tunable Distortion Limits and Corpus Cleaning for SMT

Sara Stymne, Christian Hardmeier, Jörg Tiedemann, Joakim Nivre

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

We describe the Uppsala University system for WMT13, for English-to-German translation. We use the Docent decoder, a local search decoder that translates at the document level. We add tunable distortion limits, that is, soft constraints on the maximum distortion allowed, to Docent. We also investigate cleaning of the noisy Common Crawl corpus. We show that we can use alignment-based filtering for cleaning with good results. Finally we investigate effects of corpus selection for recasing.
Original languageEnglish
Title of host publicationProceedings of the Eighth Workshop on Statistical Machine Translation
Number of pages7
Place of PublicationStroudsburg, PA
PublisherThe Association for Computational Linguistics
Publication date1 Aug 2013
Pages225-231
ISBN (Print)978-1-937284-57-2
Publication statusPublished - 1 Aug 2013
Externally publishedYes
MoE publication typeA4 Article in conference proceedings
EventWorkshop on Statistical Machine Translation - Sofia, Bulgaria
Duration: 8 Aug 20139 Aug 2013
Conference number: 8

Fields of Science

  • 6121 Languages

Cite this