From alignment of etymological data to phylogenetic inference via population genetics

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review


This paper presents a method for linking models for aligning linguistic etymological data with models for phylogenetic inference from population genetics. We begin with a large database of genetically related words—sets of cognates—from languages in a language family. We process the cognate sets to obtain a complete alignment of the data. We use the alignments as input to a model developed for phylogenetic reconstruction in population genetics. This is achieved via a natural novel projection of the linguistic data onto genetic primitives. As a result, we induce phylogenies based on aligned linguistic data. We place the method in the context of those reported in the literature, and illustrate its operation on data from the Uralic language family, which results in family trees that are very close to the “true” (expected) phylogenies.
Titel på värdpublikationThe 54th Annual Meeting of the Association for Computational Linguistics : Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning
Antal sidor11
UtgivningsortStroudsburg, PA
FörlagThe Association for Computational Linguistics
ISBN (tryckt)978-1-945626-07-4
StatusPublicerad - 2016
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangCognitive Aspects of Computational Language Learning - Berlin, Tyskland
Varaktighet: 11 aug. 201611 aug. 2016
Konferensnummer: 7


  • 113 Data- och informationsvetenskap

Citera det här