Constructional generalization over Russian collocations

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Kuvaus

The CoCoCo project aims to model multi-word expressions (MWEs) of diverse natures in a unified fashion. The algorithm predicts the most stable features in an n-gram—morphological, lexical, or constructional. In this article, we focus more on lexical compatibility of extracted collocations. At one extreme are lexically stable idioms, where no generalization is possible, e.g., lo and behold. Other collocations appear to be stable on a more abstract level of generalization. They are constructions where lexical items are replaceable but belong to the same semantic class, e.g., sleight of [hand/mouth/mind]. In this case, prediction of the entire semantic class is possible. To confirm this idea, we present a qualitative analysis of automatically extracted Russian MWEs. We then use distributional semantics methods to find semantic classes automatically and demonstrate that these correspond with manually annotated classes. This implies that the semantic classes can be used in the collocation detection algorithm.
Alkuperäiskielienglanti
LehtiMémoires de la Société néophilologique de Helsinki
VuosikertaTome C
NumeroCollocations Cross-Linguistically
Sivut121-140
Sivumäärä20
ISSN0355-0192
TilaJulkaistu - 2016
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 6121 Kielitieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet

Lainaa tätä

@article{31a9e6e944f9426082fffd8d4de99b72,
title = "Constructional generalization over Russian collocations",
abstract = "The CoCoCo project aims to model multi-word expressions (MWEs) of diverse natures in a unified fashion. The algorithm predicts the most stable features in an n-gram—morphological, lexical, or constructional. In this article, we focus more on lexical compatibility of extracted collocations. At one extreme are lexically stable idioms, where no generalization is possible, e.g., lo and behold. Other collocations appear to be stable on a more abstract level of generalization. They are constructions where lexical items are replaceable but belong to the same semantic class, e.g., sleight of [hand/mouth/mind]. In this case, prediction of the entire semantic class is possible. To confirm this idea, we present a qualitative analysis of automatically extracted Russian MWEs. We then use distributional semantics methods to find semantic classes automatically and demonstrate that these correspond with manually annotated classes. This implies that the semantic classes can be used in the collocation detection algorithm.",
keywords = "6121 Languages, 113 Computer and information sciences",
author = "Mikhail Kopotev and Lidia Pivovarova and Daria Kormacheva",
year = "2016",
language = "English",
volume = "Tome C",
pages = "121--140",
journal = "Mémoires de la Société néophilologique de Helsinki",
issn = "0355-0192",
publisher = "Soci{\'e}t{\'e} N{\'e}ophilologique de Helsinki",
number = "Collocations Cross-Linguistically",

}

Constructional generalization over Russian collocations. / Kopotev, Mikhail; Pivovarova, Lidia; Kormacheva, Daria.

julkaisussa: Mémoires de la Société néophilologique de Helsinki, Vuosikerta Tome C, Nro Collocations Cross-Linguistically, 2016, s. 121-140.

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

TY - JOUR

T1 - Constructional generalization over Russian collocations

AU - Kopotev, Mikhail

AU - Pivovarova, Lidia

AU - Kormacheva, Daria

PY - 2016

Y1 - 2016

N2 - The CoCoCo project aims to model multi-word expressions (MWEs) of diverse natures in a unified fashion. The algorithm predicts the most stable features in an n-gram—morphological, lexical, or constructional. In this article, we focus more on lexical compatibility of extracted collocations. At one extreme are lexically stable idioms, where no generalization is possible, e.g., lo and behold. Other collocations appear to be stable on a more abstract level of generalization. They are constructions where lexical items are replaceable but belong to the same semantic class, e.g., sleight of [hand/mouth/mind]. In this case, prediction of the entire semantic class is possible. To confirm this idea, we present a qualitative analysis of automatically extracted Russian MWEs. We then use distributional semantics methods to find semantic classes automatically and demonstrate that these correspond with manually annotated classes. This implies that the semantic classes can be used in the collocation detection algorithm.

AB - The CoCoCo project aims to model multi-word expressions (MWEs) of diverse natures in a unified fashion. The algorithm predicts the most stable features in an n-gram—morphological, lexical, or constructional. In this article, we focus more on lexical compatibility of extracted collocations. At one extreme are lexically stable idioms, where no generalization is possible, e.g., lo and behold. Other collocations appear to be stable on a more abstract level of generalization. They are constructions where lexical items are replaceable but belong to the same semantic class, e.g., sleight of [hand/mouth/mind]. In this case, prediction of the entire semantic class is possible. To confirm this idea, we present a qualitative analysis of automatically extracted Russian MWEs. We then use distributional semantics methods to find semantic classes automatically and demonstrate that these correspond with manually annotated classes. This implies that the semantic classes can be used in the collocation detection algorithm.

KW - 6121 Languages

KW - 113 Computer and information sciences

M3 - Article

VL - Tome C

SP - 121

EP - 140

JO - Mémoires de la Société néophilologique de Helsinki

JF - Mémoires de la Société néophilologique de Helsinki

SN - 0355-0192

IS - Collocations Cross-Linguistically

ER -