The OPUS Resource Repository: An Open Package for Creating Parallel Corpora and Machine Translation Services

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

This paper presents a flexible and powerful system for creating parallel corpora and for running neural machine translation services. Our package provides a scalable data repository backend that offers transparent data pre-processing pipelines and automatic alignment procedures that facilitate the compilation of extensive parallel data sets from a variety of sources. Moreover, we develop a web-based interface that constitutes an intuitive frontend for end-users of the platform. The whole system can easily be distributed over virtual machines and implements a sophisticated permission system with secure connections and a flexible database for storing arbitrary metadata. Furthermore, we also provide an interface for neural machine translation that can run as a service on virtual machines, which also incorporates a connection to the data repository software.
Original languageEnglish
Title of host publication22nd Nordic Conference on Computational Linguistics (NoDaLiDa) : Proceedings of the Conference
EditorsMareike Hartmann, Barbara Plank
Number of pages6
Place of PublicationLinköping
PublisherLinköping University Electronic Press
Publication date2019
Pages389–394
ISBN (Electronic)978-91-7929-995-8
Publication statusPublished - 2019
MoE publication typeA4 Article in conference proceedings
EventNordic Conference on Computational Linguistics - Turku, Finland
Duration: 30 Sept 20192 Oct 2019
Conference number: 22
https://nodalida2019.org/

Publication series

NameLinköping Electronic Conference Proceedings
Publisher Linköping University Electronic Press
Number167
ISSN (Print)1650-3686
ISSN (Electronic)1650-3740
NameNEALT Proceedings Series
Number42

Fields of Science

  • 113 Computer and information sciences
  • 6121 Languages

Cite this