Projects per year
Abstract
In this chapter, we discuss some common pitfalls related to historical data and its use in linguistic analysis. We argue that the “philologist’s dilemma”, as originally proposed by Rissanen (1989), should be reconceptualized to meet the needs of the fast-evolving field of corpus linguistics, where scholars make increasing use of big-data resources and sophisticated statistical modelling. By providing examples of errors and uncertainties related to, for example, corpus metadata, sampling, balance, and OCR accuracy, we argue that corpus linguists should pay increasingly close attention to the sampling and annotation principles employed in the compilation of historical corpora as well as to the quality of the linguistic data. We propose that the principle of “knowing one’s corpus” in terms of its compilation principles has become all the more important in the age of big-data corpora, where it is not feasible for individual researchers, or corpus compilers, to validate their data manually.
Original language | English |
---|---|
Title of host publication | Challenges in Corpus Linguistics : Rethinking corpus compilation and analysis |
Editors | Mark Kaunisto, Marco Schilk |
Number of pages | 26 |
Place of Publication | Amsterdam |
Publisher | John Benjamins |
Publication date | 19 Sept 2024 |
Pages | 9-34 |
ISBN (Print) | 978-90-272-1588-8 |
ISBN (Electronic) | 978-90-272-4653-0 |
DOIs | |
Publication status | Published - 19 Sept 2024 |
MoE publication type | A3 Book chapter |
Publication series
Name | Studies in Corpus Linguistics |
---|---|
Publisher | John Benjamins |
Volume | 118 |
ISSN (Print) | 1388-0373 |
Fields of Science
- 6121 Languages
- historical corpus linguistics
- metadata
- part-of-speech annotation
- big data
- corpus compilation
- sampling
-
Social roots of language change: Investigating change with enriched corpus data
Vartiainen, T. (Project manager)
Suomen Akatemia Projektilaskutus
01/09/2024 → 31/08/2028
Project: Research Council of Finland: Academy Research Fellow
-
RiCEP: Rise of commercial society and eighteenth-century publishing
Tolonen, M. (Principal Investigator) & Säily, T. (Co-Principal Investigator)
Academy of Finland, Finland, Suomen Akatemia Projektilaskutus
01/09/2020 → 31/08/2024
Project: Research project