Bounded-Depth High-Coverage Search Space for Noncrossing Parses

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

A recently proposed encoding for noncrossing digraphs can be used to implement generic inference over families of these digraphs and to carry out first-order factored dependency parsing. It is now shown that the recent proposal can be substantially streamlined without information loss. The improved encoding is less dependent on hierarchical processing and it gives rise to a high-coverage bounded-depth approximation of the space of non- crossing digraphs. This subset is presented elegantly by a finite-state machine that recognizes an infinite set of encoded graphs. The set includes more than 99.99% of the 0.6 million noncrossing graphs obtained from the UDv2 treebanks through planarisation. Rather than taking the low probability of the residual as a flat rate, it can be modelled with a joint probability distribution that is factorised into two underlying stochastic processes – the sentence length distribution and the related conditional distribution for deep nesting. This model points out that deep nesting in the streamlined code requires extreme sentence lengths. High depth is categorically out in common sentence lengths but emerges slowly at infrequent lengths that prompt further inquiry.
Alkuperäiskielienglanti
OtsikkoProceedings of the 13th International Conference on Finite State Methods and Natural Language Processing : FSMNLP 2017
ToimittajatFrank Drewes
Sivumäärä11
JulkaisupaikkaStroudsburg
KustantajaThe Association for Computational Linguistics
Julkaisupäivä4 syysk. 2017
Sivut30-40
ISBN (painettu)978-1-5108-4746-0
DOI - pysyväislinkit
TilaJulkaistu - 4 syysk. 2017
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaInternational Conference on Finite State Methods and Natural Language Processing (FSMNLP) - Umeå, Umeå, Ruotsi
Kesto: 5 syysk. 20177 syysk. 2017
Konferenssinumero: 13

Lisätietoja

Volume: Proceeding volume: 13

Tieteenalat

  • 6121 Kielitieteet
  • dependency graphs
  • universal dependencies
  • embedding
  • finite-state methods
  • syntax
  • sentence length
  • 113 Tietojenkäsittely- ja informaatiotieteet
  • transducers
  • encoding
  • context-free grammars
  • finite-state automata
  • 111 Matematiikka
  • state complexity
  • 112 Tilastotiede
  • sentence length
  • sentence types
  • Finnish TreeBank 1

    Bartis, I. (Muu), FIN-CLARIN-konsortio, Nykykielten laitos, Helsingin yliopisto, 2015

    Tietoaineisto

Siteeraa tätä