Corpus linguistics as digital scholarship: Big data, rich data and uncharted data

Research output: Chapter in Book/Report/Conference proceedingChapterScientificpeer-review

Abstract

This introductory chapter begins by considering how the fields of corpus linguistics, digital linguistics and digital humanities overlap, intertwine and feed off each other when it comes to making use of the increasing variety of resources available for linguistic research today. We then move on to discuss the benefits and challenges of three partly overlapping approaches to the use of digital data sources: (1) increasing data size to create “big data”, (2) supplying multi-faceted co(n)textual information and analyses to produce “rich data”, and (3) adapting existing data sets to new uses by drawing on hitherto “uncharted data”. All of them also call for new digital tools and methodologies that, in Tim Hitchcock’s words, “allow us to think small; at the same time as we are generating tools to imagine big.” We conclude the chapter by briefly describing how the contributions in this volume make use of their various data sources to answer new research questions about language use and to revisit old questions in new ways.
Original languageEnglish
Title of host publicationFrom data to evidence in English language research
EditorsCarla Suhr, Terttu Nevalainen, Irma Taavitsainen
Number of pages26
Place of PublicationLeiden
PublisherBrill
Publication dateJan 2019
Pages1-26
ISBN (Print)978-90-04-39065-2
DOIs
Publication statusPublished - Jan 2019
MoE publication typeA3 Book chapter

Publication series

NameLanguage and Computers - Studies in Digital Linguistics
PublisherBrill
Number83
ISSN (Print)0921-5034

Fields of Science

  • 6121 Languages

Cite this

Nevalainen, T. T. A., Suhr, C. M., & Taavitsainen, I. A. J. (2019). Corpus linguistics as digital scholarship: Big data, rich data and uncharted data. In C. Suhr, T. Nevalainen, & I. Taavitsainen (Eds.), From data to evidence in English language research (pp. 1-26). (Language and Computers - Studies in Digital Linguistics; No. 83). Leiden: Brill. https://doi.org/10.1163/9789004390652_002
Nevalainen, Taimi Terttu Annikki ; Suhr, Carla Maria ; Taavitsainen, Irma Aini Johanna. / Corpus linguistics as digital scholarship : Big data, rich data and uncharted data. From data to evidence in English language research. editor / Carla Suhr ; Terttu Nevalainen ; Irma Taavitsainen. Leiden : Brill, 2019. pp. 1-26 (Language and Computers - Studies in Digital Linguistics; 83).
@inbook{63cda2cffd7a49149e2a098a42a5477f,
title = "Corpus linguistics as digital scholarship: Big data, rich data and uncharted data",
abstract = "This introductory chapter begins by considering how the fields of corpus linguistics, digital linguistics and digital humanities overlap, intertwine and feed off each other when it comes to making use of the increasing variety of resources available for linguistic research today. We then move on to discuss the benefits and challenges of three partly overlapping approaches to the use of digital data sources: (1) increasing data size to create “big data”, (2) supplying multi-faceted co(n)textual information and analyses to produce “rich data”, and (3) adapting existing data sets to new uses by drawing on hitherto “uncharted data”. All of them also call for new digital tools and methodologies that, in Tim Hitchcock’s words, “allow us to think small; at the same time as we are generating tools to imagine big.” We conclude the chapter by briefly describing how the contributions in this volume make use of their various data sources to answer new research questions about language use and to revisit old questions in new ways.",
keywords = "6121 Languages",
author = "Nevalainen, {Taimi Terttu Annikki} and Suhr, {Carla Maria} and Taavitsainen, {Irma Aini Johanna}",
year = "2019",
month = "1",
doi = "10.1163/9789004390652_002",
language = "English",
isbn = "978-90-04-39065-2",
series = "Language and Computers - Studies in Digital Linguistics",
publisher = "Brill",
number = "83",
pages = "1--26",
editor = "Suhr, {Carla } and Nevalainen, {Terttu } and Taavitsainen, {Irma }",
booktitle = "From data to evidence in English language research",
address = "Netherlands",

}

Nevalainen, TTA, Suhr, CM & Taavitsainen, IAJ 2019, Corpus linguistics as digital scholarship: Big data, rich data and uncharted data. in C Suhr, T Nevalainen & I Taavitsainen (eds), From data to evidence in English language research. Language and Computers - Studies in Digital Linguistics, no. 83, Brill, Leiden, pp. 1-26. https://doi.org/10.1163/9789004390652_002

Corpus linguistics as digital scholarship : Big data, rich data and uncharted data. / Nevalainen, Taimi Terttu Annikki; Suhr, Carla Maria; Taavitsainen, Irma Aini Johanna.

From data to evidence in English language research. ed. / Carla Suhr; Terttu Nevalainen; Irma Taavitsainen. Leiden : Brill, 2019. p. 1-26 (Language and Computers - Studies in Digital Linguistics; No. 83).

Research output: Chapter in Book/Report/Conference proceedingChapterScientificpeer-review

TY - CHAP

T1 - Corpus linguistics as digital scholarship

T2 - Big data, rich data and uncharted data

AU - Nevalainen, Taimi Terttu Annikki

AU - Suhr, Carla Maria

AU - Taavitsainen, Irma Aini Johanna

PY - 2019/1

Y1 - 2019/1

N2 - This introductory chapter begins by considering how the fields of corpus linguistics, digital linguistics and digital humanities overlap, intertwine and feed off each other when it comes to making use of the increasing variety of resources available for linguistic research today. We then move on to discuss the benefits and challenges of three partly overlapping approaches to the use of digital data sources: (1) increasing data size to create “big data”, (2) supplying multi-faceted co(n)textual information and analyses to produce “rich data”, and (3) adapting existing data sets to new uses by drawing on hitherto “uncharted data”. All of them also call for new digital tools and methodologies that, in Tim Hitchcock’s words, “allow us to think small; at the same time as we are generating tools to imagine big.” We conclude the chapter by briefly describing how the contributions in this volume make use of their various data sources to answer new research questions about language use and to revisit old questions in new ways.

AB - This introductory chapter begins by considering how the fields of corpus linguistics, digital linguistics and digital humanities overlap, intertwine and feed off each other when it comes to making use of the increasing variety of resources available for linguistic research today. We then move on to discuss the benefits and challenges of three partly overlapping approaches to the use of digital data sources: (1) increasing data size to create “big data”, (2) supplying multi-faceted co(n)textual information and analyses to produce “rich data”, and (3) adapting existing data sets to new uses by drawing on hitherto “uncharted data”. All of them also call for new digital tools and methodologies that, in Tim Hitchcock’s words, “allow us to think small; at the same time as we are generating tools to imagine big.” We conclude the chapter by briefly describing how the contributions in this volume make use of their various data sources to answer new research questions about language use and to revisit old questions in new ways.

KW - 6121 Languages

U2 - 10.1163/9789004390652_002

DO - 10.1163/9789004390652_002

M3 - Chapter

SN - 978-90-04-39065-2

T3 - Language and Computers - Studies in Digital Linguistics

SP - 1

EP - 26

BT - From data to evidence in English language research

A2 - Suhr, Carla

A2 - Nevalainen, Terttu

A2 - Taavitsainen, Irma

PB - Brill

CY - Leiden

ER -

Nevalainen TTA, Suhr CM, Taavitsainen IAJ. Corpus linguistics as digital scholarship: Big data, rich data and uncharted data. In Suhr C, Nevalainen T, Taavitsainen I, editors, From data to evidence in English language research. Leiden: Brill. 2019. p. 1-26. (Language and Computers - Studies in Digital Linguistics; 83). https://doi.org/10.1163/9789004390652_002