Quantifying variation and estimating the effects of sample size on the frequencies of linguistic variables

Heikki Mannila, Terttu Nevalainen, Helena Raumolin-Brunberg

    Research output: Chapter in Book/Report/Conference proceedingChapterScientificpeer-review


    Estimating the frequency of linguistic variables is a fundamental task in the analysis of linguistic data. However, as it is often the case that the amount of material available from different people or text categories may vary, the simplest methods of calculating frequencies are not always appropriate. In this article, we discuss different approaches, including bootstrap methods and a Bayesian approach, and compare the results they yield with those given by some of the simple measures in common use, such as pooling and averaging. We also study the effect of sample size on the accuracy of the estimates.
    Original languageEnglish
    Title of host publicationResearch Methods in Language Variation and Change
    EditorsManfred Krug, Julia Schlüter
    Number of pages24
    Place of PublicationCambridge
    PublisherCambrigde University Press
    Publication date2013
    ISBN (Print)9780521181860
    Publication statusPublished - 2013
    MoE publication typeA3 Book chapter

    Fields of Science

    • 113 Computer and information sciences
    • 6121 Languages
    • bootstrap
    • Bayesian analysis
    • corpus linguistcs
    • linguistic variable

    Cite this