Avoimet ja kieliriippumattomat automaattimenetelmät yhteisen kielentutkimusinfrastruktuurin resurssien tuottamiseksi

Beskrivning

English Title: Open and Language Independent Automata-Based Resource Production Methods for Common Language Research Infrastructure

The hearts of the European speak 50 - 100 languages that are too important to be ignored in research, content production or education. The current application is concerned with the lack of resources needed when we implement our shared vision of a multilingual society. Due to a similar concern, the EU Commission is funding the preparatory phase of the “Common Lan- guage Resource and Language Technology Research Infrastructure (CLARIN)”, a pan-European initiative that aims to establish an interoperable and integrated research infrastructure.

Finland is one of the main partners of CLARIN, and the current application would form a significant national contribution to its success. The proposed research would be based on two particular strengths of the Finnish research: language-independent finite-state technology and open-source technology.

• Finite-state technology is very useful in natural language processing because with it lin- guistic rules can be compiled into efficient models in computer.
• The use of open source technology ensures the widest applicability of the language tech- nology infrastructure.

The purpose of the proposed basic research is to create a renewed theory of compilation of linguistic knowledge into finite-state models. The commercial grammar formalisms for finite- state morphology are based on complicated 10-20-year old algorithms that have not been adopted in free and open-source software. The applicant’s recent results indicate that more parsimonious computational equipment could improve the elegancy and generality of the modeled linguistic formalisms.

Language resource building needs practical methods. The proposed research would establish new solutions to the long-standing space explosion problem of compilers, and develop descrip- tive means for complex grammatical phenomena. The reduced need for hi-tech equipment means that language-independent modeling formalisms can become freely available, which empowers any language community to build e.g. morphological models for themselves.

Multi-lingual morphological models based on finite-state technology will be in a significant role in CLARIN, and they would enable competitive language research and advanced language technology applications. With the morphological models, computers can have a capability to translate texts from one dialect to another, facilitate reading of foreign language texts, aid in language learning, assist in natural language queries from e.g. internet, and improve the qual- ity of common content production. These applications help multi-lingual education to reduce inequality, poverty and insecurity in the society and can make a better future for our children.
StatusSlutfört
Gällande start-/slutdatum01/01/200931/12/2011

Finansiering

  • Unknown funder: 195 450,00 €

Vetenskapsgrenar

  • 612 Språk och litteratur
  • 113 Data- och informationsvetenskap