Projekteja vuodessa
Abstrakti
In this paper, we investigate paraphrase generation in the colloquial domain. We use state-of-the-art neural machine translation models trained on the Opusparcus corpus to generate paraphrases in six languages: German, English, Finnish, French, Russian, and Swedish. We perform experiments to understand how data selection and filtering for diverse paraphrase pairs affects the generated paraphrases. We compare two different model architectures, an RNN and a Transformer model, and find that the Transformer does not generally outperform the RNN. We also conduct human evaluation on five of the six languages and compare the results to the automatic evaluation metrics BLEU and the recently proposed BERTScore. The results advance our understanding of the trade-offs between the quality and novelty of generated paraphrases, affected by the data selection method. In addition, our comparison of the evaluation methods shows that while BLEU correlates well with human judgments at the corpus level, BERTScore outperforms BLEU in both corpus and sentence-level evaluation.
Alkuperäiskieli | englanti |
---|---|
Otsikko | Proceedings of the 12th Language Resources and Evaluation Conference |
Toimittajat | Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis |
Sivumäärä | 9 |
Julkaisupaikka | Paris |
Kustantaja | European Language Resources Association (ELRA) |
Julkaisupäivä | 1 toukok. 2020 |
Sivut | 1814-1822 |
ISBN (elektroninen) | 979-10-95546-34-4 |
Tila | Julkaistu - 1 toukok. 2020 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisuussa |
Tapahtuma | Language Resources and Evaluation Conference - [LREC 2020 was cancelled] Kesto: 11 toukok. 2020 → 16 toukok. 2020 Konferenssinumero: 12 https://lrec2020.lrec-conf.org/ |
Tieteenalat
- 6121 Kielitieteet
- 113 Tietojenkäsittely- ja informaatiotieteet
Projektit
- 1 Päättynyt
-
fiskmö: Creation of a parallel corpus of translated documents and machine translation for Finnish and Swedish
Tiedemann, J., Ginter, F., Papula, N., Aulamo, M., Nieminen, T., Kanerva, J. & Eskola, K.
01/05/2018 → 31/03/2021
Projekti: Tutkimusprojekti