Projekt per år
Sammanfattning
For historical reasons relating to the building of the Finnish and Estonian nations, Finnic oral poetry has been recorded, archived, curated and digitised in exceptional amounts. A similar poetic system was in use in Estonian, Votic, Ingrian, Karelian, Lydic, and Finnish. Altogether, there are currently 283,206 Finnic texts available in digital form in the Estonian and Finnish corpora (ERAB, SKVR, JR).
In this article, we analyse the basic quantitative characteristics of these corpora. We first create an overview of the history of curating, organising and digitising Finnic oral poetry and explain how we have managed these datasets in the FILTER project. We then look at the basic quantitative characteristics of the dataset, especially those relating to recording history. Finally, we explain some data and metadata issues we have identified during the work of merging the datasets into one database and exploring it. Some of these are issues that need to be taken into account also when conducting qualitative research.
The historical archival data of Finnic oral poetry is uneven and biased in various ways. Computational views – and expert close readings of these – reveal some new perspectives on the characteristics and problematics of the data. Yet, if not taken into account properly, these very same issues also easily distort computations, visualisations and interpretations. Thus, it is necessary that, even when creating computational and quantitative perspectives, the researchers also know their data, read the texts, and are cautious with the metadata, remembering to consult previous manual research, original manuscripts and wider archival collections when needed.
In this article, we analyse the basic quantitative characteristics of these corpora. We first create an overview of the history of curating, organising and digitising Finnic oral poetry and explain how we have managed these datasets in the FILTER project. We then look at the basic quantitative characteristics of the dataset, especially those relating to recording history. Finally, we explain some data and metadata issues we have identified during the work of merging the datasets into one database and exploring it. Some of these are issues that need to be taken into account also when conducting qualitative research.
The historical archival data of Finnic oral poetry is uneven and biased in various ways. Computational views – and expert close readings of these – reveal some new perspectives on the characteristics and problematics of the data. Yet, if not taken into account properly, these very same issues also easily distort computations, visualisations and interpretations. Thus, it is necessary that, even when creating computational and quantitative perspectives, the researchers also know their data, read the texts, and are cautious with the metadata, remembering to consult previous manual research, original manuscripts and wider archival collections when needed.
Bidragets översatta titel | Proceed with Care: A Critical Computational Perspective on Digital Folklore Corpora |
---|---|
Originalspråk | finska |
Tidskrift | Elore |
Volym | 30 |
Nummer | 1 |
Sidor (från-till) | 59–90 |
Antal sidor | 32 |
ISSN | 1456-3010 |
DOI | |
Status | Publicerad - 2023 |
MoE-publikationstyp | A1 Tidskriftsartikel-refererad |
Vetenskapsgrenar
- 6160 Övriga humanistiska vetenskaper
- 113 Data- och informationsvetenskap
-
REFOP: Regional cultures of Finnic oral poetry: comparative perspective
01/09/2021 → 31/08/2026
Projekt: Forskningsprojekt
-
FILTER: Formulaic intertextuality, thematic networks and poetic variation across regional cultures of Finnic oral poetry (Academy of Finland research project no. 333138)
Kallio, K., Mäkelä, E., Janicki, M. M., Saarinen, J., Sarv, M. & Kanner, A.
01/09/2020 → 31/08/2024
Projekt: Forskningsprojekt