Analysing Finnish with word lists

The DDI approach to morphology revisited

Research output: Chapter in Book/Report/Conference proceedingConference contributionProfessional

Abstract

Morphological lexicons for morphologically complex languages provide good text coverage at the cost of overgeneration, difficulty of modification, and sometimes performance issues. Use of simple, manageable lexicon forms – especially lists – for morphologically complex languages may appear unviable because the number of possible word-forms in a morphologically complex language can be prohibitively high. We created and experimented with a list-based lexicon for a morphologically complex language (Finnish), and compared its coverage with that of a mature morphological analyser on new text in two experimental settings. The observed smallish difference in coverage suggests the viability of using simple and easy-to-modify list-based lexicons as an initial part of morphological analysis, to increase developer control on the vast majority of input tokens.
Original languageEnglish
Title of host publicationProceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages
Number of pages10
Place of PublicationStroudsburg
PublisherAssociation for Computational Linguistics
Publication date2018
Pages171-180
Publication statusPublished - 2018
MoE publication typeD3 Professional conference proceedings
EventInternational Workshop on Computational Linguistics for Uralic Languages - Helsinki, Finland
Duration: 8 Jan 20189 Jan 2018
Conference number: 4

Fields of Science

  • 6121 Languages

Cite this

Voutilainen, A. T., & Palolahti, M. J. (2018). Analysing Finnish with word lists: The DDI approach to morphology revisited. In Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages (pp. 171-180). Stroudsburg: Association for Computational Linguistics.
Voutilainen, Atro Tapio ; Palolahti, Maria Johanna. / Analysing Finnish with word lists : The DDI approach to morphology revisited. Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages. Stroudsburg : Association for Computational Linguistics, 2018. pp. 171-180
@inproceedings{fa2608694cf94946898d594afed91208,
title = "Analysing Finnish with word lists: The DDI approach to morphology revisited",
abstract = "Morphological lexicons for morphologically complex languages provide good text coverage at the cost of overgeneration, difficulty of modification, and sometimes performance issues. Use of simple, manageable lexicon forms – especially lists – for morphologically complex languages may appear unviable because the number of possible word-forms in a morphologically complex language can be prohibitively high. We created and experimented with a list-based lexicon for a morphologically complex language (Finnish), and compared its coverage with that of a mature morphological analyser on new text in two experimental settings. The observed smallish difference in coverage suggests the viability of using simple and easy-to-modify list-based lexicons as an initial part of morphological analysis, to increase developer control on the vast majority of input tokens.",
keywords = "6121 Languages",
author = "Voutilainen, {Atro Tapio} and Palolahti, {Maria Johanna}",
year = "2018",
language = "English",
pages = "171--180",
booktitle = "Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages",
publisher = "Association for Computational Linguistics",
address = "International",

}

Voutilainen, AT & Palolahti, MJ 2018, Analysing Finnish with word lists: The DDI approach to morphology revisited. in Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages. Association for Computational Linguistics, Stroudsburg, pp. 171-180, International Workshop on Computational Linguistics for Uralic Languages, Helsinki, Finland, 08/01/2018.

Analysing Finnish with word lists : The DDI approach to morphology revisited. / Voutilainen, Atro Tapio; Palolahti, Maria Johanna.

Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages. Stroudsburg : Association for Computational Linguistics, 2018. p. 171-180.

Research output: Chapter in Book/Report/Conference proceedingConference contributionProfessional

TY - GEN

T1 - Analysing Finnish with word lists

T2 - The DDI approach to morphology revisited

AU - Voutilainen, Atro Tapio

AU - Palolahti, Maria Johanna

PY - 2018

Y1 - 2018

N2 - Morphological lexicons for morphologically complex languages provide good text coverage at the cost of overgeneration, difficulty of modification, and sometimes performance issues. Use of simple, manageable lexicon forms – especially lists – for morphologically complex languages may appear unviable because the number of possible word-forms in a morphologically complex language can be prohibitively high. We created and experimented with a list-based lexicon for a morphologically complex language (Finnish), and compared its coverage with that of a mature morphological analyser on new text in two experimental settings. The observed smallish difference in coverage suggests the viability of using simple and easy-to-modify list-based lexicons as an initial part of morphological analysis, to increase developer control on the vast majority of input tokens.

AB - Morphological lexicons for morphologically complex languages provide good text coverage at the cost of overgeneration, difficulty of modification, and sometimes performance issues. Use of simple, manageable lexicon forms – especially lists – for morphologically complex languages may appear unviable because the number of possible word-forms in a morphologically complex language can be prohibitively high. We created and experimented with a list-based lexicon for a morphologically complex language (Finnish), and compared its coverage with that of a mature morphological analyser on new text in two experimental settings. The observed smallish difference in coverage suggests the viability of using simple and easy-to-modify list-based lexicons as an initial part of morphological analysis, to increase developer control on the vast majority of input tokens.

KW - 6121 Languages

M3 - Conference contribution

SP - 171

EP - 180

BT - Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages

PB - Association for Computational Linguistics

CY - Stroudsburg

ER -

Voutilainen AT, Palolahti MJ. Analysing Finnish with word lists: The DDI approach to morphology revisited. In Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages. Stroudsburg: Association for Computational Linguistics. 2018. p. 171-180