Finite-state description, developing mental awareness

Research output: Chapter in Book/Report/Conference proceedingChapterScientificpeer-review


In this article, we approach finite-state description practices that must be instilled in the developer. Thoughts are presented accompanied by reference to concrete experiences with different languages and their description. We contend that finite-state description of languages leads to development in the describer-developer. This presupposes regular interaction with developers of upstream and downstream technologies. And as more languages are described, the developer learns what to choose as a starting point, hopefully with the help of a researcher, research documentation or native speaker well versed in the workings of the language. We maintain that finite-state work should serve more than one purpose or audience, and that, as linguists, we should be raising the bar by applying the knowledge of research to description, so that our understanding of the linguistic phenomena can be attested by others or proven false. We are providing a methodology for repeatable experimentation and rule making. We see that each language provides something unique, while sharing some recognizable features with other languages. We stress the necessity to avoid generating characters from epsilons and offer examples where it is possible to write rules that reduce characters to epsilons instead. We also stress the need to describe the predictable infinite set of all native phenomena, whereas the unknown and random qualities introduced through language contact cannot form a foundation for our descriptions. Finally, we call for a playful approach to phenomena in a language, because that might bring us closer to how a child would learn the language – through repetition, mistakes and self-correction.
Original languageEnglish
Title of host publicationRule-Based Language Technology
EditorsArvi Hurskainen, Kimmo Koskenniemi, Tommi Pirinen
Number of pages11
Place of PublicationTartu
PublisherNorthern European Association for Language Technology
Publication dateApr 2023
Publication statusPublished - Apr 2023
MoE publication typeA3 Book chapter

Publication series

NameNEALT Monograph Series
PublisherNorthern European Association for Language Technology (NEALT)
ISSN (Electronic)1736-6291

Fields of Science

  • 6121 Languages
  • finite-state morphology
  • regular morphology
  • Võro language
  • Lushootseed language
  • Moksha language
  • Erzya language
  • Komi-Zyrian language
  • Skolt Saami language

Cite this