Transition-Based Coding and Formal Language Theory for Ordered Digraphs

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Transition-based parsing of natural language uses transition systems to build directed annotation graphs (digraphs) for sentences. In this paper, we define, for an arbitrary ordered digraph, a unique decomposition and a corresponding linear encoding that are associated bijectively with each other via a new transition system. These results give us an efficient and succinct representation for digraphs and sets of digraphs. Based on the system and our analysis of its syntactic properties, we give structural bounds under which the set of encoded digraphs is restricted and becomes a context-free or a regular string language. The context-free restriction is essentially a superset of the encodings used previously to characterise properties of noncrossing digraphs and to solve maximal subgraphs problems. The regular restriction with a tight bound is shown to capture the Universal Dependencies v2.4 treebanks in linguistics.
Translated title of the contributionJärjestettyjen verkkojen siirtymäpohjainen koodaus ja formaalien kielten teoria
Original languageEnglish
Title of host publicationThe 14th International Conference on Finite-State Methods and Natural Language Processing : Proceedings of the Conference
EditorsHeiko Vogler, Andreas Maletti
Number of pages14
Place of PublicationStroudsburg
PublisherThe Association for Computational Linguistics
Publication date23 Sept 2019
Pages118–131
ISBN (Electronic)978-1-950737-96-3
DOIs
Publication statusPublished - 23 Sept 2019
MoE publication typeA4 Article in conference proceedings
EventInternational Conference on Finite State Methods and Natural Language Processing - Dresden, Germany
Duration: 23 Sept 201925 Sept 2019
Conference number: 14

Publication series

NameProceedings of the International Conference on Finite-State Methods and Natural Language Processing
PublisherAssociation for Computational Linguistics

Bibliographical note

The ISBN of the host publication can be found on the web site of the conference (https://wwwtcs.inf.tu-dresden.de/fsmnlp2019/accepted_papers/).

Fields of Science

  • 113 Computer and information sciences
  • 6121 Languages

Cite this