Integration of Infrastructure Resources - Computational Morphology

Project Details


The EU Commission is funding the preparatory phase of the Common Language Resource and Technology Research Infrastructure (CLARIN) for the years 2008-2010. CLARIN (2008) is one of the 34 infrastructures listed in the roadmap of the European Strategy Forum on Research Infrastructures (ESFRI). CLARIN is a pan-European initiative, which aims to establish an interoperable and integrated research infrastructure for language resources and their technologies. Within the CLARIN infrastructure, there is a demand for language technology that covers all of the 50-100 languages in Europe. Finland is one of the main partners of CLARIN and the current application would form a significant national contribution to the long term success of CLARIN.

Finite-state transducers (FSTs) have proven very useful in natural language processing. Finite-state methods provide means for uniform treatment of morphology and other lower level phenomena in different languages. Even if the languages differ, the corresponding finite-state transducers (FST) can be used by the very same natural language processing applications. In addition, the algorithms applying FSTs can be updated without a need to rebuild FSTs for each new algorithm.

This foundational FST methodology for language technology is a particular strength of Finland and German in CLARIN, and it motivates the current application. The efforts of the HFST research group in Helsinki complement some activities in Stuttgart. HFST adds new tools for computational morphology in the CLARIN infrastructure by extending the Stuttgart Finite-State Tool (SFST) (Schmid 2005) that has been developed by Helmut Schmid.

The HFST effort has built a programming interface (HFST interface) on top of SFST calculus. Furthermore, various morphological formalisms (HFST-SFST, HFST-LEXC, HFST-TWOLC,...) are based on this interface. The role of the interface is to enforce modularization to the design: The underlying finite-state transducer calculi is separated from the actual grammar and lexicon formalisms. The Figure 1 illustrates the modular design.

The purpose of the visit is to help coordination of the HFST and SFST efforts and help integrating some diverging developments so that we do not build two different tools but actually accumulate the Finnish-German efforts and maximize the total impact to the CLARIN infrastructure.

Effective start/end date07/04/200931/12/2010


  • Suomen Akatemia: €6,880.00

Fields of Science

  • 6121 Languages
  • 113 Computer and information sciences