Eteneminen omalla vastuulla: Lähdekriittinen laskennallinen näkökulma sähköisiin kansanrunoaineistoihin

Kati Kallio, Maciej Janicki, Eetu Mäkelä, Jukka Saarinen, Mari Sarv, Liina Saarlo

Forskningsoutput: TidskriftsbidragArtikelVetenskapligPeer review

Sammanfattning

For historical reasons relating to the building of the Finnish and Estonian nations, Finnic oral poetry has been recorded, archived, curated and digitised in exceptional amounts. A similar poetic system was in use in Estonian, Votic, Ingrian, Karelian, Lydic, and Finnish. Altogether, there are currently 283,206 Finnic texts available in digital form in the Estonian and Finnish corpora (ERAB, SKVR, JR).

In this article, we analyse the basic quantitative characteristics of these corpora. We first create an overview of the history of curating, organising and digitising Finnic oral poetry and explain how we have managed these datasets in the FILTER project. We then look at the basic quantitative characteristics of the dataset, especially those relating to recording history. Finally, we explain some data and metadata issues we have identified during the work of merging the datasets into one database and exploring it. Some of these are issues that need to be taken into account also when conducting qualitative research.

The historical archival data of Finnic oral poetry is uneven and biased in various ways. Computational views – and expert close readings of these – reveal some new perspectives on the characteristics and problematics of the data. Yet, if not taken into account properly, these very same issues also easily distort computations, visualisations and interpretations. Thus, it is necessary that, even when creating computational and quantitative perspectives, the researchers also know their data, read the texts, and are cautious with the metadata, remembering to consult previous manual research, original manuscripts and wider archival collections when needed.
Bidragets översatta titelProceed with Care: A Critical Computational Perspective on Digital Folklore Corpora
Originalspråkfinska
TidskriftElore
Volym30
Nummer1
Sidor (från-till)59–90
Antal sidor32
ISSN1456-3010
DOI
StatusPublicerad - 2023
MoE-publikationstypA1 Tidskriftsartikel-refererad

Vetenskapsgrenar

  • 6160 Övriga humanistiska vetenskaper
  • 113 Data- och informationsvetenskap

Citera det här