Abstract
This paper presents a flexible and powerful system for creating parallel corpora and for running neural machine translation services. Our package provides a scalable data repository backend that offers transparent data pre-processing pipelines and automatic alignment procedures that facilitate the compilation of extensive parallel data sets from a variety of sources. Moreover, we develop a web-based interface that constitutes an intuitive frontend for end-users of the platform. The whole system can easily be distributed over virtual machines and implements a sophisticated permission system with secure connections and a flexible database for storing arbitrary metadata. Furthermore, we also provide an interface for neural machine translation that can run as a service on virtual machines, which also incorporates a connection to the data repository software.
| Original language | English |
|---|---|
| Title of host publication | 22nd Nordic Conference on Computational Linguistics (NoDaLiDa) : Proceedings of the Conference |
| Editors | Mareike Hartmann, Barbara Plank |
| Number of pages | 6 |
| Place of Publication | Linköping |
| Publisher | Linköping University Electronic Press |
| Publication date | 2019 |
| Pages | 389–394 |
| ISBN (Electronic) | 978-91-7929-995-8 |
| Publication status | Published - 2019 |
| MoE publication type | A4 Article in conference proceedings |
| Event | Nordic Conference on Computational Linguistics - Turku, Finland Duration: 30 Sept 2019 → 2 Oct 2019 Conference number: 22 https://nodalida2019.org/ |
Publication series
| Name | Linköping Electronic Conference Proceedings |
|---|---|
| Publisher | Linköping University Electronic Press |
| Number | 167 |
| ISSN (Print) | 1650-3686 |
| ISSN (Electronic) | 1650-3740 |
| Name | NEALT Proceedings Series |
|---|---|
| Number | 42 |
Fields of Science
- 113 Computer and information sciences
- 6121 Languages
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver