Abstract
This paper describes baseline systems for
Finnish-English and English-Finnish machine translation using standard phrasebased and factored models including morphological features. We experiment with compound splitting and morphological segmentation and study the effect of adding noisy out-of-domain data to the parallel and the monolingual training data. Our results stress the importance of training data
and demonstrate the effectiveness of morphological pre-processing of Finnish.
Finnish-English and English-Finnish machine translation using standard phrasebased and factored models including morphological features. We experiment with compound splitting and morphological segmentation and study the effect of adding noisy out-of-domain data to the parallel and the monolingual training data. Our results stress the importance of training data
and demonstrate the effectiveness of morphological pre-processing of Finnish.
Original language | English |
---|---|
Title of host publication | Proceedings of the Tenth Workshop on Statistical Machine Translation |
Number of pages | 7 |
Place of Publication | New York |
Publisher | The Association for Computational Linguistics |
Publication date | 1 Sep 2015 |
Pages | 177-183 |
Publication status | Published - 1 Sep 2015 |
Externally published | Yes |
MoE publication type | A4 Article in conference proceedings |
Event | Workshop on Statistical Machine Translation - Lisboa, Portugal Duration: 17 Sep 2015 → 18 Sep 2015 Conference number: 10 |
Fields of Science
- 6121 Languages