Constructional generalization over Russian collocations

Forskningsoutput: TidskriftsbidragArtikelVetenskapligPeer review

Sammanfattning

The CoCoCo project aims to model multi-word expressions (MWEs) of diverse natures in a unified fashion. The algorithm predicts the most stable features in an n-gram—morphological, lexical, or constructional. In this article, we focus more on lexical compatibility of extracted collocations. At one extreme are lexically stable idioms, where no generalization is possible, e.g., lo and behold. Other collocations appear to be stable on a more abstract level of generalization. They are constructions where lexical items are replaceable but belong to the same semantic class, e.g., sleight of [hand/mouth/mind]. In this case, prediction of the entire semantic class is possible. To confirm this idea, we present a qualitative analysis of automatically extracted Russian MWEs. We then use distributional semantics methods to find semantic classes automatically and demonstrate that these correspond with manually annotated classes. This implies that the semantic classes can be used in the collocation detection algorithm.
Originalspråkengelska
TidskriftMémoires de la Société néophilologique de Helsinki
VolymTome C
NummerCollocations Cross-Linguistically
Sidor (från-till)121-140
Antal sidor20
ISSN0355-0192
StatusPublicerad - 2016
MoE-publikationstypA1 Tidskriftsartikel-refererad

Vetenskapsgrenar

  • 6121 Språkvetenskaper
  • 113 Data- och informationsvetenskap

Citera det här