The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies

Christoph Rzymski, Tiago Tresoldi, Simon J. Greenhill, Mei-Shin Wu, Nathanael E. Schweikhard, Maria Koptjevskaja-Tamm, Volker Gast, Timotheus A. Bodt, Abbie Hantgan, Gereon A. Kaiping, Sophie Chang, Yunfan Lai, Natalia Morozova, Heini Arjava, Nataliia Hübler, Ezequiel Koile, Steve Pepper, Mariann Proos, Briana Van Epps, Ingrid BlancoCarolin Hundt, Sergei Monakhov, Kristina Pianykh, Sallona Ramesh, Russell D. Gray, Robert Forkel, Johann-Mattis List

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.
Original languageEnglish
JournalScientific data
Volume7
Issue number1
ISSN2052-4463
DOIs
Publication statusPublished - 2020
MoE publication typeA1 Journal article-refereed

Fields of Science

  • 6121 Languages

Cite this

Rzymski, C., Tresoldi, T., Greenhill, S. J., Wu, M-S., Schweikhard, N. E., Koptjevskaja-Tamm, M., ... List, J-M. (2020). The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies. Scientific data , 7(1). https://doi.org/10.1038/s41597-019-0341-x
Rzymski, Christoph ; Tresoldi, Tiago ; Greenhill, Simon J. ; Wu, Mei-Shin ; Schweikhard, Nathanael E. ; Koptjevskaja-Tamm, Maria ; Gast, Volker ; Bodt, Timotheus A. ; Hantgan, Abbie ; Kaiping, Gereon A. ; Chang, Sophie ; Lai, Yunfan ; Morozova, Natalia ; Arjava, Heini ; Hübler, Nataliia ; Koile, Ezequiel ; Pepper, Steve ; Proos, Mariann ; Van Epps, Briana ; Blanco, Ingrid ; Hundt, Carolin ; Monakhov, Sergei ; Pianykh, Kristina ; Ramesh, Sallona ; Gray, Russell D. ; Forkel, Robert ; List, Johann-Mattis. / The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies. In: Scientific data . 2020 ; Vol. 7, No. 1.
@article{ee099ad50ad74b6f92acd0a2c7529bd6,
title = "The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies",
abstract = "Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.",
keywords = "6121 Languages",
author = "Christoph Rzymski and Tiago Tresoldi and Greenhill, {Simon J.} and Mei-Shin Wu and Schweikhard, {Nathanael E.} and Maria Koptjevskaja-Tamm and Volker Gast and Bodt, {Timotheus A.} and Abbie Hantgan and Kaiping, {Gereon A.} and Sophie Chang and Yunfan Lai and Natalia Morozova and Heini Arjava and Nataliia H{\"u}bler and Ezequiel Koile and Steve Pepper and Mariann Proos and {Van Epps}, Briana and Ingrid Blanco and Carolin Hundt and Sergei Monakhov and Kristina Pianykh and Sallona Ramesh and Gray, {Russell D.} and Robert Forkel and Johann-Mattis List",
year = "2020",
doi = "10.1038/s41597-019-0341-x",
language = "English",
volume = "7",
journal = "Scientific data",
issn = "2052-4463",
publisher = "Nature Publishing Group",
number = "1",

}

Rzymski, C, Tresoldi, T, Greenhill, SJ, Wu, M-S, Schweikhard, NE, Koptjevskaja-Tamm, M, Gast, V, Bodt, TA, Hantgan, A, Kaiping, GA, Chang, S, Lai, Y, Morozova, N, Arjava, H, Hübler, N, Koile, E, Pepper, S, Proos, M, Van Epps, B, Blanco, I, Hundt, C, Monakhov, S, Pianykh, K, Ramesh, S, Gray, RD, Forkel, R & List, J-M 2020, 'The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies', Scientific data , vol. 7, no. 1. https://doi.org/10.1038/s41597-019-0341-x

The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies. / Rzymski, Christoph; Tresoldi, Tiago; Greenhill, Simon J.; Wu, Mei-Shin; Schweikhard, Nathanael E.; Koptjevskaja-Tamm, Maria; Gast, Volker; Bodt, Timotheus A.; Hantgan, Abbie; Kaiping, Gereon A.; Chang, Sophie; Lai, Yunfan; Morozova, Natalia; Arjava, Heini; Hübler, Nataliia; Koile, Ezequiel; Pepper, Steve; Proos, Mariann; Van Epps, Briana; Blanco, Ingrid; Hundt, Carolin; Monakhov, Sergei; Pianykh, Kristina; Ramesh, Sallona; Gray, Russell D.; Forkel, Robert; List, Johann-Mattis.

In: Scientific data , Vol. 7, No. 1, 2020.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies

AU - Rzymski, Christoph

AU - Tresoldi, Tiago

AU - Greenhill, Simon J.

AU - Wu, Mei-Shin

AU - Schweikhard, Nathanael E.

AU - Koptjevskaja-Tamm, Maria

AU - Gast, Volker

AU - Bodt, Timotheus A.

AU - Hantgan, Abbie

AU - Kaiping, Gereon A.

AU - Chang, Sophie

AU - Lai, Yunfan

AU - Morozova, Natalia

AU - Arjava, Heini

AU - Hübler, Nataliia

AU - Koile, Ezequiel

AU - Pepper, Steve

AU - Proos, Mariann

AU - Van Epps, Briana

AU - Blanco, Ingrid

AU - Hundt, Carolin

AU - Monakhov, Sergei

AU - Pianykh, Kristina

AU - Ramesh, Sallona

AU - Gray, Russell D.

AU - Forkel, Robert

AU - List, Johann-Mattis

PY - 2020

Y1 - 2020

N2 - Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.

AB - Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.

KW - 6121 Languages

U2 - 10.1038/s41597-019-0341-x

DO - 10.1038/s41597-019-0341-x

M3 - Article

VL - 7

JO - Scientific data

JF - Scientific data

SN - 2052-4463

IS - 1

ER -