A Usable Finite-State Model for Adequate Syntactic Complexity

Project Details

Description

I hope to be the first who will present a compact finite-state automaton (FSA) assigning adequate, non-projective dependency analyses to natural language sentences. Recently, I have finally discovered FSAs that can be represented compactly, while conventional FSAs would be too large to fit into a physical computer. Conventional automata work on only one input tape, but these new automata have additional hidden tapes storing the derivation of a dependency-syntactic analysis (Yli-Jyrä 2012). If these tapes are factored and separated, the immense size of the representation of the automaton collapses. The result is a practical and highly efficient system that still has the excellent algebraic properties that characterise finite automata.

The purpose of the proposed project is to transform the discovered idea into a new methodology by demonstrating a finite-state dependency syntactic grammar and its parser (analyser) that works linguistically adequately in practice. That is, it covers the observed non-projective structures and generalises it to unseen data via language-specific knowledge and linguistically universal statistical learning.

The new methodology will be excellent due to its combinatorial characteristics and faithfulness to the observed complexity of natural language. It does not trivialise the structural ambiguity of natural language like the state-of-the-art methods but rather uses its compact representation to factor and store the ambiguity as well. The stored ambiguity could be further disambiguated with highly accurate constraint grammars or language models by exploiting the closure properties of finite-state languages. This does not trivialise the lexical features, but models full lexical frames instead. It seems also possible that the empirical studies will motivate a rather low characterisation for the descriptive complexity of the parser, which would advance the Occamistic elegance in the linguistic methodology. Moreover, the existence of a fast and structurally accurate and language-independent finite-state parser would be a significant contribution in the debate that concerns the adequacy of finite-state grammars and the practical relevance of the generative, idealistic view of language.

Layman's description

The project will develop methodological ideas needed to realise the notion of finite-state syntax in practice.
AcronymADEQSYNTAX
StatusFinished
Effective start/end date01/09/201330/04/2019

Funding

  • Suomen tietokirjailijat: €3,000.00

Fields of Science

  • 113 Computer and information sciences
  • automata theory
  • formal language theory
  • coding theory
  • finite-state transducers
  • nonprojectivity
  • projectivity
  • 6121 Languages
  • Autosegmental Phonology
  • representations
  • grammars
  • functional syntax
  • dependency syntax
  • syntactic parsing
  • syntactic complexity
  • 111 Mathematics
  • bijections
  • sekvential functions
  • graphs