Language Technology Seminar Talk: Transition-Based Coding and Formal Language Theory for Ordered Digraphs

Yli-Jyrä, A. (Puhuja)

Aktiviteetti: Puhe- tai esitystyypitSuullinen esitys


Transition-based parsing of natural language uses transition systems to build directed annotation graphs (digraphs) for sentences. In this paper, we define, for an arbitrary ordered digraph, a unique decomposition and a corresponding linear encoding that are associated bijectively with each other via a new transition system. These results give us an efficient and succinct representation for digraphs and sets of digraphs. Based on the system and our analysis of its syntactic properties, we give structural bounds under which the set of encoded digraphs is restricted and becomes a context-free or a regular string language. The context-free restriction is essentially a superset of the encodings used previously to characterize properties of noncrossing digraphs and to solve maximal subgraphs problems. The regular restriction with a tight bound is shown to capture the Universal Dependencies v2.4 treebanks in linguistics. I will try to relate the technical results informally to
(1) neural dependency parsing (seq2seq parsing and parsing as labeling)
(2) finite-state limits of natural language complexity
(3) graph-based semantic parsing and the development of a graph-based Constraint Grammar
(4) linear representation of translation alignment, with applications to interlinear texts.

However, the core presentation aims at explaining how the vertex-ordered graphs can be encoded, because this is the problem where the paper contributes to the state of the art.
Aikajakso24 lokakuuta 2019
Tapahtuman otsikkoResearch Seminar in Language Technology (Academic Year 2019-2020)
Tapahtuman tyyppiSeminaari
SijaintiHelsinki, Suomi
Tunnustuksen arvoPaikallinen