Kuvaus
Transition-based parsing of natural language uses transition systems to build directed annotation graphs (digraphs) for sentences. In this paper, we define, for an arbitrary ordered digraph, a unique decomposition and a corresponding linear encoding that are associated bijectively with each other via a new transition system. These results give us an efficient and succinct representation for digraphs and sets of digraphs. Based on the system and our analysis of its syntactic properties, we give structural bounds under which the set of encoded digraphs is restricted and becomes a context-free or a regular string language. The context-free restriction is essentially a superset of the encodings used previously to characterize properties of noncrossing digraphs and to solve maximal subgraphs problems. The regular restriction with a tight bound is shown to capture the Universal Dependencies v2.4 treebanks in linguistics. I will try to relate the technical results informally to(1) neural dependency parsing (seq2seq parsing and parsing as labeling)
(2) finite-state limits of natural language complexity
(3) graph-based semantic parsing and the development of a graph-based Constraint Grammar
(4) linear representation of translation alignment, with applications to interlinear texts.
However, the core presentation aims at explaining how the vertex-ordered graphs can be encoded, because this is the problem where the paper contributes to the state of the art.
Aikajakso | 24 lokak. 2019 |
---|---|
Tapahtuman otsikko | Research Seminar in Language Technology (Academic Year 2019-2020) |
Tapahtuman tyyppi | Seminaari |
Sijainti | Helsinki, SuomiNäytä kartalla |
Tunnustuksen arvo | Paikallinen |
Tähän liittyvä sisältö
-
Projektit
-
A Usable Finite-State Model for Adequate Syntactic Complexity
Projekti: Tutkimusprojekti
-
Vaikutukset
-
Graph Encoding Schemes for Syntactic and Semantic Labeling in Public Information Retrieval Infrastructures
Vaikutus: !!Impact › Muut vaikutukset, Julkiset palvelut ja yhteiskunnan toiminta