Finite-state description, developing mental awareness

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKirjan luku tai artikkeliTieteellinenvertaisarvioitu

Abstrakti

In this article, we approach finite-state description practices that must be instilled in the developer. Thoughts are presented accompanied by reference to concrete experiences with different languages and their description. We contend that finite-state description of languages leads to development in the describer-developer. This presupposes regular interaction with developers of upstream and downstream technologies. And as more languages are described, the developer learns what to choose as a starting point, hopefully with the help of a researcher, research documentation or native speaker well versed in the workings of the language. We maintain that finite-state work should serve more than one purpose or audience, and that, as linguists, we should be raising the bar by applying the knowledge of research to description, so that our understanding of the linguistic phenomena can be attested by others or proven false. We are providing a methodology for repeatable experimentation and rule making. We see that each language provides something unique, while sharing some recognizable features with other languages. We stress the necessity to avoid generating characters from epsilons and offer examples where it is possible to write rules that reduce characters to epsilons instead. We also stress the need to describe the predictable infinite set of all native phenomena, whereas the unknown and random qualities introduced through language contact cannot form a foundation for our descriptions. Finally, we call for a playful approach to phenomena in a language, because that might bring us closer to how a child would learn the language – through repetition, mistakes and self-correction.
Alkuperäiskielienglanti
OtsikkoRule-Based Language Technology
ToimittajatArvi Hurskainen, Kimmo Koskenniemi, Tommi Pirinen
Sivumäärä11
JulkaisupaikkaTartu
KustantajaNorthern European Association for Language Technology
Julkaisupäivähuhtik. 2023
Sivut217-227
TilaJulkaistu - huhtik. 2023
OKM-julkaisutyyppiA3 Kirjan tai muun kokoomateoksen osa

Julkaisusarja

NimiNEALT Monograph Series
KustantajaNorthern European Association for Language Technology (NEALT)
Vuosikerta2[1]
ISSN (elektroninen)1736-6291

Tieteenalat

  • 6121 Kielitieteet

Siteeraa tätä