Measuring Language Closeness by Modeling Regularity

Javad Nouri, Roman Yangarber

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

This paper addresses the problems of measuring similarity between languages—where the term language covers any of the senses denoted by language, dialect or linguistic variety, as defined by any theory. We argue that to devise an effective way to measure the similarity between languages one should build a probabilistic model that tries to capture as much regular correspondence between the languages as possible. This approach yields two benefits. First, given a set of language data, for any two models, this gives a way of objectively determining which model is better, i.e., which model is more likely to be accurate and informative. Second, given a model, for any two languages we can determine, in a principled way, how close they are. The better models will be better at judging similarity. We present experiments on data from three language families to support these ideas. In particular, our results demonstrate the arbitrary nature of terms such as language vs. dialect, when applied to related languages.
Alkuperäiskielienglanti
OtsikkoProceedings of the EMNLP’2014Workshop : Language Technology for Closely Related Languages and Language Variants
Sivumäärä10
KustantajaACL
Julkaisupäivälokak. 2014
Sivut56-65
ISBN (painettu)978-1-937284-96-1
TilaJulkaistu - lokak. 2014
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaLanguage Technology for Closely Related Languages and Language Variants - Doha, Qatar
Kesto: 29 lokak. 201429 lokak. 2014
Konferenssinumero: EMNLP 2014 (LT4CloseLang)

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä