Talk to LyS Research Group: Embedding Graphs to the String Space to Support Succinct Representation of Graph Properties

Yli-Jyrä, A. (Puhuja)

Aktiviteetti: Puhe- tai esitystyypitSuullinen esitys

Kuvaus

Classical formal language theory concerns subsets of a finitely generated free monoid. In this presentation, I define the universe of graphs as an infinitely generated free monoid, whose generator has a string representation as an indexed language. I also show that a particularly bounded height-deterministic context-free subset represents a generator for a proper superset of the crossing graphs and that a more bounded regular subset represents a generator that is sufficient for representing observed dependency trees of natural language syntax as strings. Such embedding of graphs to height-deterministic languages suggests that efficiently decidable fragments of monadic second-order logic over graphs could be pursued and implemented via the closure properties of height-deterministic context-free languages.

These results are implied by my recent paper that forms the core of the presentation. The paper is called "Transition-Based Coding and Formal Language Theory for Ordered Digraphs" and it is online (ACL Anthology / FSMNLP 2019). Its abstract reads:

Transition-based parsing of natural language uses transition systems to build directed annotation graphs (digraphs) for sentences. In this paper, we define, for an arbitrary ordered digraph, a unique decomposition and a corresponding linear encoding that are associated bijectively with each other via a new transition system. These results give us an efficient and succinct representation for digraphs and sets of digraphs. Based on the system and our analysis of its syntactic properties, we give structural bounds under which the set of encoded digraphs is restricted and becomes a context-free or a regular string language. The context-free restriction is essentially a superset of the encodings used previously to characterize properties of noncrossing digraphs and to solve maximal subgraphs problems. The regular restriction with a tight bound is shown to capture the Universal Dependencies v2.4 treebanks in linguistics.

There is also a related implementation of an encoder/decoder: https://github.com/amikael/coding
Aikajakso5 marraskuuta 2019
PidettyUniversity of A Coruña, Department of Computer Science, LyS research group, Espanja
Tunnustuksen arvoKansainvälinen