Projekt per år
Sammanfattning
A recently proposed balanced-bracket encoding (Yli-Jyrä and Gómez-Rodríguez 2017) has given us a way to embed all noncrossing dependency graphs into the string space and to formulate their exact arc-factored inference problem (Kuhlmann and Johnsson 2015) as the best string problem in a dynamically constructed and weighted unambiguous context-free grammar. The current work improves the encoding
and makes it shallower by omitting redundant brackets from it. The streamlined encoding gives rise to a bounded-depth subset approximation that is represented by a small finite-state automaton. When bounded to 7 levels of balanced brackets, the automaton has 762 states and represents a strict superset of more than 99.9999% of the noncrossing
trees available in Universal Dependencies 2.4 (Nivre et al. 2019). In addition, it strictly contains all 15-vertex noncrossing digraphs. When bounded to 4 levels and 90 states, the automaton still captures 99.2% of all noncrossing trees in the reference dataset. The approach is flexible and extensible towards unrestricted graphs, and it suggests tight
finite-state bounds for dependency parsing, and for the main existing
parsing methods.
and makes it shallower by omitting redundant brackets from it. The streamlined encoding gives rise to a bounded-depth subset approximation that is represented by a small finite-state automaton. When bounded to 7 levels of balanced brackets, the automaton has 762 states and represents a strict superset of more than 99.9999% of the noncrossing
trees available in Universal Dependencies 2.4 (Nivre et al. 2019). In addition, it strictly contains all 15-vertex noncrossing digraphs. When bounded to 4 levels and 90 states, the automaton still captures 99.2% of all noncrossing trees in the reference dataset. The approach is flexible and extensible towards unrestricted graphs, and it suggests tight
finite-state bounds for dependency parsing, and for the main existing
parsing methods.
Bidragets översatta titel | Kuinka risteämättömät Universal Dependencies puupankkien puut voidaan voidaan upottaa matalakompleksiseen säännölliseen kieleen |
---|---|
Originalspråk | engelska |
Tidskrift | Journal of Language Modelling |
Volym | 7 |
Nummer | 2 |
Sidor (från-till) | 177-232 |
Antal sidor | 56 |
ISSN | 2299-856X |
DOI | |
Status | Publicerad - 2019 |
MoE-publikationstyp | A1 Tidskriftsartikel-refererad |
Vetenskapsgrenar
- 113 Data- och informationsvetenskap
- 6121 Språkvetenskaper
Projekt
- 1 Slutfört
-
ADEQSYNTAX: Käytettävä äärellistilainen malli adekvaatille kieliopilliselle kompleksisuudelle
01/09/2013 → 30/04/2019
Projekt: Forskningsprojekt
Forskningsdatauppsättningar
-
Universal Dependencies version 2.3
Rueter, J. (Skapad av), Tyers, F. M. (Medverkande) & Zeman, D. (Medverkande), Universal Dependencies Consortium, 15 nov. 2018
http://hdl.handle.net/11234/1-2895
Datauppsättning
Aktiviteter
- 1 !!Oral presentation
-
Embedding Properties of Graphs to the String Space (talk at the Hebrew University, Jerusalem)
Anssi Yli-Jyrä (!!Speaker)
7 nov. 2018Aktivitet: Typer för tal eller presentation › !!Oral presentation