Quantifying variation and estimating the effects of sample size on the frequencies of linguistic variables

Heikki Mannila, Terttu Nevalainen, Helena Raumolin-Brunberg

Research output: Chapter in Book/Report/Conference proceedingChapterScientificpeer-review

Abstract

Estimating the frequency of linguistic variables is a fundamental task in the analysis of linguistic data. However, as it is often the case that the amount of material available from different people or text categories may vary, the simplest methods of calculating frequencies are not always appropriate. In this article, we discuss different approaches, including bootstrap methods and a Bayesian approach, and compare the results they yield with those given by some of the simple measures in common use, such as pooling and averaging. We also study the effect of sample size on the accuracy of the estimates.
Original languageEnglish
Title of host publicationResearch Methods in Language Variation and Change
EditorsManfred Krug, Julia Schlüter
Number of pages24
Place of PublicationCambridge
PublisherCambrigde University Press
Publication date2013
Pages337-360
ISBN (Print)9780521181860
Publication statusPublished - 2013
MoE publication typeA3 Book chapter

Fields of Science

  • 113 Computer and information sciences
  • 6121 Languages
  • bootstrap
  • Bayesian analysis
  • corpus linguistcs
  • linguistic variable

Cite this