Abstract
This paper presents a flexible and powerful system for creating parallel corpora and for running neural machine translation services. Our package provides a scalable data repository backend that offers transparent data pre-processing pipelines and automatic alignment procedures that facilitate the compilation of extensive parallel data sets from a variety of sources. Moreover, we develop a web-based interface that constitutes an intuitive frontend for end-users of the platform. The whole system can easily be distributed over virtual machines and implements a sophisticated permission system with secure connections and a flexible database for storing arbitrary metadata. Furthermore, we also provide an interface for neural machine translation that can run as a service on virtual machines, which also incorporates a connection to the data repository software.
Original language | English |
---|---|
Title of host publication | 22nd Nordic Conference on Computational Linguistics (NoDaLiDa) : Proceedings of the Conference |
Editors | Mareike Hartmann, Barbara Plank |
Number of pages | 6 |
Place of Publication | Linköping |
Publisher | Linköping University Electronic Press |
Publication date | 2019 |
Pages | 389–394 |
ISBN (Electronic) | 978-91-7929-995-8 |
Publication status | Published - 2019 |
MoE publication type | A4 Article in conference proceedings |
Event | Nordic Conference on Computational Linguistics - Turku, Finland Duration: 30 Sept 2019 → 2 Oct 2019 Conference number: 22 https://nodalida2019.org/ |
Publication series
Name | Linköping Electronic Conference Proceedings |
---|---|
Publisher | Linköping University Electronic Press |
Number | 167 |
ISSN (Print) | 1650-3686 |
ISSN (Electronic) | 1650-3740 |
Name | NEALT Proceedings Series |
---|---|
Number | 42 |
Fields of Science
- 113 Computer and information sciences
- 6121 Languages