MDGs and African Language Technology: Roadmap to the Development of Bantu Language Resources

    Project: Research project

    Project Details

    Description (abstract)

    Development is intricately linked to language questions. Access to information works through media that are more and more characterised by electronic language tools and human/machine inter- action. African languages, notwithstanding some exceptions, are lagging behind in terms of language technology. This is partially a question of political priorities as well as financial and human skills resources, but also a scientific problem, since there are specific issues to be tackled when it comes to under-researched languages. Our project focuses on Bantu languages beyond the borders of South Africa.

    The key interest underlying our proposal is to build African language resources. Their availability will enhance the use of African languages in language technology. This work will take place in close partnership with the currently developing European infrastructure (CLARIN) which will thus benefit from a better representation of African languages with their specific needs based on typological properties of these languages (in this case from the Bantu group). Capacity-building in the relevant sector, both in Africa and in Helsinki, is an essential feature of our project.

    In terms of methodology, two major perspectives are represented in the project. Linguistic ques- tions concern mainly the building of resources and tools for African languages, in this case specifically Bantu languages. Our focus will be on languages of wider distribution with few resources, such as Oshikwanyama, Otjiherero, Setswana, and possibly others (Kinyarwanda, Bemba, Chichewa, etc.).

    The identification of target languages is based on criteria such as the availability of material and the urgency with which computational linguistic should be carried out. The former is mainly a sci- entific constraint, depending on the availability of language descriptions, formal grammars and lex- ical databases. The second is largely determined by political factors, interest of the members of the language communities, government policies, and availability of local counterparts, competence and willingness to cooperate.

    The main linguistic task will consist in assessing relevant reference material, electronic corpora, and other language resources. At the same time, some effort will be dedicated to networking ac- tivities with scholars working on the relevant African languages. From the computational side, an innovative approach (“pointwise weighted finite-state”) is pursued in the project. In practical terms, computational linguists will scrutinise the hypothesis that available reference material (much of which is generative, rule-based) can be successfully exploited for improving FS methods and a constraint- based understanding of the relevant language properties.

    The scientific results concern mostly the methodological questions outlined in the previous para- graph. At the same time, this initiative will be of immediate relevance to language practitioners in African countries and thus open new paths to improved educational chances, one of the crucial factors for sustainable development of human resources in Africa.
    AcronymLT4AFRICA
    StatusFinished
    Effective start/end date01/01/201031/12/2010

    Funding

    • Suomen Akatemia: €47,300.00

    Fields of Science

    • 6121 Languages
    • Bantu languages
    • morphology
    • grammatical tone
    • language documentation
    • 113 Computer and information sciences
    • finite-state methods
    • 519 Social and economic geography
    • African countries
    • language development
    • Millenium Development Goals