The OPUS Resource Repository

An Open Package for Creating Parallel Corpora and Machine Translation Services

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

This paper presents a flexible and powerful system for creating parallel corpora and for running neural machine translation services. Our package provides a scalable data repository backend that offers transparent data pre-processing pipelines and automatic alignment procedures that facilitate the compilation of extensive parallel data sets from a variety of sources. Moreover, we develop a web-based interface that constitutes an intuitive frontend for end-users of the platform. The whole system can easily be distributed over virtual machines and implements a sophisticated permission system with secure connections and a flexible database for storing arbitrary metadata. Furthermore, we also provide an interface for neural machine translation that can run as a service on virtual machines, which also incorporates a connection to the data repository software.
Originalspråkengelska
Titel på gästpublikation22nd Nordic Conference on Computational Linguistics (NoDaLiDa) : Proceedings of the Conference
RedaktörerMareike Hartmann, Barbara Plank
Antal sidor6
UtgivningsortLinköping
FörlagLinköping University Electronic Press
Utgivningsdatum2019
Sidor389–394
ISBN (elektroniskt)978-91-7929-995-8
StatusPublicerad - 2019
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangNodalida - Nordic Conference on Computational Linguistics - Turku, Finland
Varaktighet: 30 sep 20192 okt 2019
Konferensnummer: 22
https://nodalida2019.org/

Publikationsserier

NamnLinköping Electronic Conference Proceedings
Förlag Linköping University Electronic Press
Nummer167
ISSN (tryckt)1650-3686
ISSN (elektroniskt)1650-3740
NamnNEALT Proceedings Series
Nummer42

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap
  • 6121 Språkvetenskaper

Citera det här

Aulamo, M., & Tiedemann, J. (2019). The OPUS Resource Repository: An Open Package for Creating Parallel Corpora and Machine Translation Services. I M. Hartmann, & B. Plank (Red.), 22nd Nordic Conference on Computational Linguistics (NoDaLiDa): Proceedings of the Conference (s. 389–394). (Linköping Electronic Conference Proceedings; Nr. 167), (NEALT Proceedings Series; Nr. 42). Linköping: Linköping University Electronic Press.
Aulamo, Mikko ; Tiedemann, Jörg. / The OPUS Resource Repository : An Open Package for Creating Parallel Corpora and Machine Translation Services. 22nd Nordic Conference on Computational Linguistics (NoDaLiDa): Proceedings of the Conference. redaktör / Mareike Hartmann ; Barbara Plank. Linköping : Linköping University Electronic Press, 2019. s. 389–394 (Linköping Electronic Conference Proceedings; 167). (NEALT Proceedings Series; 42).
@inproceedings{e73384c4b7314426a17bd44e8066bbf3,
title = "The OPUS Resource Repository: An Open Package for Creating Parallel Corpora and Machine Translation Services",
abstract = "This paper presents a flexible and powerful system for creating parallel corpora and for running neural machine translation services. Our package provides a scalable data repository backend that offers transparent data pre-processing pipelines and automatic alignment procedures that facilitate the compilation of extensive parallel data sets from a variety of sources. Moreover, we develop a web-based interface that constitutes an intuitive frontend for end-users of the platform. The whole system can easily be distributed over virtual machines and implements a sophisticated permission system with secure connections and a flexible database for storing arbitrary metadata. Furthermore, we also provide an interface for neural machine translation that can run as a service on virtual machines, which also incorporates a connection to the data repository software.",
keywords = "113 Computer and information sciences, 6121 Languages",
author = "Mikko Aulamo and J{\"o}rg Tiedemann",
year = "2019",
language = "English",
series = "Link{\"o}ping Electronic Conference Proceedings",
publisher = "Link{\"o}ping University Electronic Press",
number = "167",
pages = "389–394",
editor = "Mareike Hartmann and Barbara Plank",
booktitle = "22nd Nordic Conference on Computational Linguistics (NoDaLiDa)",
address = "Sweden",

}

Aulamo, M & Tiedemann, J 2019, The OPUS Resource Repository: An Open Package for Creating Parallel Corpora and Machine Translation Services. i M Hartmann & B Plank (red), 22nd Nordic Conference on Computational Linguistics (NoDaLiDa): Proceedings of the Conference. Linköping Electronic Conference Proceedings, nr. 167, NEALT Proceedings Series, nr. 42, Linköping University Electronic Press, Linköping, s. 389–394, Nodalida - Nordic Conference on Computational Linguistics, Turku, Finland, 30/09/2019.

The OPUS Resource Repository : An Open Package for Creating Parallel Corpora and Machine Translation Services. / Aulamo, Mikko; Tiedemann, Jörg.

22nd Nordic Conference on Computational Linguistics (NoDaLiDa): Proceedings of the Conference. red. / Mareike Hartmann; Barbara Plank. Linköping : Linköping University Electronic Press, 2019. s. 389–394 (Linköping Electronic Conference Proceedings; Nr. 167), (NEALT Proceedings Series; Nr. 42).

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

TY - GEN

T1 - The OPUS Resource Repository

T2 - An Open Package for Creating Parallel Corpora and Machine Translation Services

AU - Aulamo, Mikko

AU - Tiedemann, Jörg

PY - 2019

Y1 - 2019

N2 - This paper presents a flexible and powerful system for creating parallel corpora and for running neural machine translation services. Our package provides a scalable data repository backend that offers transparent data pre-processing pipelines and automatic alignment procedures that facilitate the compilation of extensive parallel data sets from a variety of sources. Moreover, we develop a web-based interface that constitutes an intuitive frontend for end-users of the platform. The whole system can easily be distributed over virtual machines and implements a sophisticated permission system with secure connections and a flexible database for storing arbitrary metadata. Furthermore, we also provide an interface for neural machine translation that can run as a service on virtual machines, which also incorporates a connection to the data repository software.

AB - This paper presents a flexible and powerful system for creating parallel corpora and for running neural machine translation services. Our package provides a scalable data repository backend that offers transparent data pre-processing pipelines and automatic alignment procedures that facilitate the compilation of extensive parallel data sets from a variety of sources. Moreover, we develop a web-based interface that constitutes an intuitive frontend for end-users of the platform. The whole system can easily be distributed over virtual machines and implements a sophisticated permission system with secure connections and a flexible database for storing arbitrary metadata. Furthermore, we also provide an interface for neural machine translation that can run as a service on virtual machines, which also incorporates a connection to the data repository software.

KW - 113 Computer and information sciences

KW - 6121 Languages

M3 - Conference contribution

T3 - Linköping Electronic Conference Proceedings

SP - 389

EP - 394

BT - 22nd Nordic Conference on Computational Linguistics (NoDaLiDa)

A2 - Hartmann, Mareike

A2 - Plank, Barbara

PB - Linköping University Electronic Press

CY - Linköping

ER -

Aulamo M, Tiedemann J. The OPUS Resource Repository: An Open Package for Creating Parallel Corpora and Machine Translation Services. I Hartmann M, Plank B, redaktörer, 22nd Nordic Conference on Computational Linguistics (NoDaLiDa): Proceedings of the Conference. Linköping: Linköping University Electronic Press. 2019. s. 389–394. (Linköping Electronic Conference Proceedings; 167). (NEALT Proceedings Series; 42).