Impact of Legal Status of Data on Development of Data-Intensive Products: Example of Language Technologies

Aleksei Kelli, Arvi Tavast, Krister Linden, Ramunas Bristonas, Penny Labropoulou, Kadri Vider, Irene Kull, Gaabriel Tavits, Age Värv, Vadim Mantrov

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


The purpose of this article is to explain the extent to which the legal regime applicable to language data affects the development and use of language technology (LT). The main focus of the paper is on EU law. The article also maps possible text and data mining (TDM) issues. The authors focus on TDM for research purposes outlined in the Digital Copyright Directive 2019/790.The authors follow a process approach of LT development, which starts from raw data collection and leads to LT products such as a refrigerator with a speech interface. Particular attention is given to language models.The raw data used in LT often include copyright-protected works, objects of related rights (e.g., performances) and personal data in the form of person’s voice or other information stored in non-annotated and annotated databases.The authors’ main argument is that the legal regime of language data does not usually affect the use of language models since copyrighted works are not likely to remain in models. In the process of developing a language technology application, language models are the first intermediate result that can be free from legal restrictions affecting language data. The use of a person’s voice as identifiable personal data in a language model can create legal challenges. In some cases, developers of language technology must be careful how to address issues of processing of personal data contained in models.
Original languageEnglish
Title of host publicationLegal Science: Functions, Significance and Future in Legal Systems II : Collection of Research Papers in Conjunction with the 7th International Scientific Conference of the Faculty of Law of the University of Latvia (16–18 October 2019, Riga)
EditorsA Damberga
Number of pages18
Place of PublicationRiga
PublisherUniversity of Latvia press
Publication date2020
ISBN (Electronic)9789934185304
Publication statusPublished - 2020
MoE publication typeA4 Article in conference proceedings
Event7th International Scientific Conference of the Faculty of Law of the University of Latvia - Riga, Latvia
Duration: 16 Oct 201918 Oct 2019
Conference number: 7

Bibliographical note

7th International Conference of the Faculty-of-Law-of-the-University-of-Latvia on Legal Science - Functions,Significance and Future in Legal Systems, Riga, LATVIA, OCT 16-18, 2019

Fields of Science

  • 6121 Languages

Cite this