Abstract
The purpose of this article is to explain the extent to which the legal regime applicable to language data affects the development and use of language technology (LT). The main focus of the paper is on EU law. The article also maps possible text and data mining (TDM) issues. The authors focus on TDM for research purposes outlined in the Digital Copyright Directive 2019/790.The authors follow a process approach of LT development, which starts from raw data collection and leads to LT products such as a refrigerator with a speech interface. Particular attention is given to language models.The raw data used in LT often include copyright-protected works, objects of related rights (e.g., performances) and personal data in the form of person’s voice or other information stored in non-annotated and annotated databases.The authors’ main argument is that the legal regime of language data does not usually affect the use of language models since copyrighted works are not likely to remain in models. In the process of developing a language technology application, language models are the first intermediate result that can be free from legal restrictions affecting language data. The use of a person’s voice as identifiable personal data in a language model can create legal challenges. In some cases, developers of language technology must be careful how to address issues of processing of personal data contained in models.
Original language | English |
---|---|
Title of host publication | Legal Science: Functions, Significance and Future in Legal Systems II : Collection of Research Papers in Conjunction with the 7th International Scientific Conference of the Faculty of Law of the University of Latvia (16–18 October 2019, Riga) |
Editors | A Damberga |
Number of pages | 18 |
Place of Publication | Riga |
Publisher | University of Latvia press |
Publication date | 2020 |
Pages | 383–400 |
ISBN (Electronic) | 9789934185304 |
DOIs | |
Publication status | Published - 2020 |
MoE publication type | A4 Article in conference proceedings |
Event | 7th International Scientific Conference of the Faculty of Law of the University of Latvia - Riga, Latvia Duration: 16 Oct 2019 → 18 Oct 2019 Conference number: 7 |
Bibliographical note
7th International Conference of the Faculty-of-Law-of-the-University-of-Latvia on Legal Science - Functions,Significance and Future in Legal Systems, Riga, LATVIA, OCT 16-18, 2019Fields of Science
- 6121 Languages
Equipment
-
CLARIN - Common Language Resource and Technology Infrastructure in Finland
Krister Linden (Manager)
Department of Digital HumanitiesFacility/equipment: Equipment