Constructional generalization over Russian collocations

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Abstrakti

The CoCoCo project aims to model multi-word expressions (MWEs) of diverse natures in a unified fashion. The algorithm predicts the most stable features in an n-gram—morphological, lexical, or constructional. In this article, we focus more on lexical compatibility of extracted collocations. At one extreme are lexically stable idioms, where no generalization is possible, e.g., lo and behold. Other collocations appear to be stable on a more abstract level of generalization. They are constructions where lexical items are replaceable but belong to the same semantic class, e.g., sleight of [hand/mouth/mind]. In this case, prediction of the entire semantic class is possible. To confirm this idea, we present a qualitative analysis of automatically extracted Russian MWEs. We then use distributional semantics methods to find semantic classes automatically and demonstrate that these correspond with manually annotated classes. This implies that the semantic classes can be used in the collocation detection algorithm.
Alkuperäiskielienglanti
LehtiMémoires de la Société néophilologique de Helsinki
VuosikertaTome C
NumeroCollocations Cross-Linguistically
Sivut121-140
Sivumäärä20
ISSN0355-0192
TilaJulkaistu - 2016
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 6121 Kielitieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä