YK:n vuosituhatjulistus ja Afrikan kieliteknologia. Tiekartta bantukielten resurssien kehittämiseksi.



Development is intricately linked to language questions. Access to information works through media that are more and more characterised by electronic language tools and human/machine inter- action. African languages, notwithstanding some exceptions, are lagging behind in terms of language technology. This is partially a question of political priorities as well as financial and human skills resources, but also a scientific problem, since there are specific issues to be tackled when it comes to under-researched languages. Our project focuses on Bantu languages beyond the borders of South Africa.

The key interest underlying our proposal is to build African language resources. Their availability will enhance the use of African languages in language technology. This work will take place in close partnership with the currently developing European infrastructure (CLARIN) which will thus benefit from a better representation of African languages with their specific needs based on typological properties of these languages (in this case from the Bantu group). Capacity-building in the relevant sector, both in Africa and in Helsinki, is an essential feature of our project.

In terms of methodology, two major perspectives are represented in the project. Linguistic ques- tions concern mainly the building of resources and tools for African languages, in this case specifically Bantu languages. Our focus will be on languages of wider distribution with few resources, such as Oshikwanyama, Otjiherero, Setswana, and possibly others (Kinyarwanda, Bemba, Chichewa, etc.).

The identification of target languages is based on criteria such as the availability of material and the urgency with which computational linguistic should be carried out. The former is mainly a sci- entific constraint, depending on the availability of language descriptions, formal grammars and lex- ical databases. The second is largely determined by political factors, interest of the members of the language communities, government policies, and availability of local counterparts, competence and willingness to cooperate.

The main linguistic task will consist in assessing relevant reference material, electronic corpora, and other language resources. At the same time, some effort will be dedicated to networking ac- tivities with scholars working on the relevant African languages. From the computational side, an innovative approach (“pointwise weighted finite-state”) is pursued in the project. In practical terms, computational linguists will scrutinise the hypothesis that available reference material (much of which is generative, rule-based) can be successfully exploited for improving FS methods and a constraint- based understanding of the relevant language properties.

The scientific results concern mostly the methodological questions outlined in the previous para- graph. At the same time, this initiative will be of immediate relevance to language practitioners in African countries and thus open new paths to improved educational chances, one of the crucial factors for sustainable development of human resources in Africa.
Gällande start-/slutdatum01/01/201031/12/2010


  • Suomen Akatemia: 47 300,00 €


  • 6121 Språkvetenskaper
  • 113 Data- och informationsvetenskap
  • 519 Socialgeografi och ekonomisk geografi


HFST - Helsinki Finite-State Technology

Linden, K., Koskenniemi, K., Yli-Jyrä, A., Hulden, M., Silfverberg, M., Pirinen, T., Axelson, E., Hardwick, S., Niemi, J. & Hurskainen, A.

01/01/2005 → …

Projekt: Forskningsprojekt

Graph-Based Representation of Cross-Lingual Alignments

Abend, O., Ronning, M. & Miles, G.


Projekt: Forskningsprojekt


On Practical Realisation of Autosegmental Representations in Lexical Transducers of Tonal Bantu Languages

Yli-Jyrä, A., 13 jan 2020, (Insänt) LT4ALL. UNESCO, 4 s.

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Optimal Kornai-Karttunen Codes for Restricted Autosegmental Representations

Yli-Jyrä, A. M., 2019, Tokens of Meaning: Papers in Honor of Lauri Karttunen. Condoravdi, C. & Holloway King, T. (red.). Stanford: Center for the Study of Language and Information (CSLI)

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKapitelVetenskapligPeer review


  • 2 Akademiskt besök på HU
  • 2 Akademisk besök på annan institution

The Rachel and Selim Benin School of Engineering and Computer Science, The Hebrew University of Jerusalem, Israel

Anssi Yli-Jyrä (Besökande forskare)

10 dec 201919 dec 2019

Aktivitet: Typer för besök till extern institutionAkademisk besök på annan institution

University of Umea, Department of Computer Science

Anssi Yli-Jyrä (Besökande forskare), , Frank Drewes (Annan roll), & Henrik Björklund (Annan roll)

4 nov 20196 nov 2019

Aktivitet: Typer för besök till extern institutionAkademisk besök på annan institution

Adam Jardine

Anssi Yli-Jyrä (Värd)

30 jul 20185 aug 2018

Aktivitet: Typer för att vara värd för en besökareAkademiskt besök på HU