Constructional generalization over Russian collocations

Research output: Contribution to journalArticleScientificpeer-review

Abstract

The CoCoCo project aims to model multi-word expressions (MWEs) of diverse natures in a unified fashion. The algorithm predicts the most stable features in an n-gram—morphological, lexical, or constructional. In this article, we focus more on lexical compatibility of extracted collocations. At one extreme are lexically stable idioms, where no generalization is possible, e.g., lo and behold. Other collocations appear to be stable on a more abstract level of generalization. They are constructions where lexical items are replaceable but belong to the same semantic class, e.g., sleight of [hand/mouth/mind]. In this case, prediction of the entire semantic class is possible. To confirm this idea, we present a qualitative analysis of automatically extracted Russian MWEs. We then use distributional semantics methods to find semantic classes automatically and demonstrate that these correspond with manually annotated classes. This implies that the semantic classes can be used in the collocation detection algorithm.
Original languageEnglish
JournalMémoires de la Société néophilologique de Helsinki
VolumeTome C
Issue numberCollocations Cross-Linguistically
Pages (from-to)121-140
Number of pages20
ISSN0355-0192
Publication statusPublished - 2016
MoE publication typeA1 Journal article-refereed

Fields of Science

  • 6121 Languages
  • 113 Computer and information sciences

Cite this