Automatic text simplification of Russian texts using control tokens

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

This paper describes the research on the possibilities to control automatic text simplification with special tokens that allow modifying the length, paraphrasing degree, syntactic complexity, and the CEFR (Common European Framework of Reference) grade level of the output texts, i.e. the level of language proficiency a non-native speaker would need to understand them. The project is focused on Russian texts and aims to continue and broaden the existing research on controlled Russian text simplification. It is done by exploring available datasets for monolingual Russian machine translation (paraphrasing and simplification), experimenting with various model architectures, and adding control tokens that have not been used on Russian texts previously.
Original languageEnglish
Title of host publicationProceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)
EditorsJakub Piskorski, Michał Marcińczuk, Preslav Nakov, et al.
Number of pages8
Place of PublicationStroudsburg
PublisherAssociation for Computational Linguistics (ACL)
Publication dateMay 2023
Pages70-77
ISBN (Electronic)978-1-959429-57-9
Publication statusPublished - May 2023
MoE publication typeA4 Article in conference proceedings
EventWorkshop on Slavic Natural Language Processing - Dubrovnik, Croatia
Duration: 6 May 20236 May 2023
Conference number: 9
http://bsnlp.cs.helsinki.fi/

Fields of Science

  • 6121 Languages
  • 113 Computer and information sciences

Cite this