Projects per year
Organisation profile
Organisation Profile
Language technology is a multidisciplinary field. It often comes with the label computational linguistics, natural language processing (NLP) or natural language engineering (NLE). In language technology we study methods and develop models and tools for processing human language. This includes models for natural language understanding and human language generation also across languages. In Helsinki we focus on
- Cross-lingual NLP including machine translation
- NLP for languages with a rich morphology
- NLP for low-resource languages and in the humanities
Activities and news from our research group are available at our website.
Fields of Science
- 113 Computer and information sciences
- language technology
- natural language processing
- natural language engineering
- 6121 Languages
- computational linguistics
- language technology
Collaborations and top research areas from the last five years
Profiles
-
Mikko Aulamo
- Department of Digital Humanities - Doctoral Researcher
- Doctoral Programme in Language Studies
- Language Technology
Person: U1 Research and teaching staff, Doctoral Researcher
-
Mathias Creutz
- Department of Digital Humanities - Senior University Lecturer, Title of Docent
- Doctoral Programme in Language Studies - Supervisor for doctoral programme
- Language Technology
Person: U3 Research and teaching staff
-
Ona De Gibert Bonet, PhD Student
- Department of Digital Humanities - Doctoral Researcher
- Doctoral Programme in Language Studies
- Language Technology
Person: U1 Research and teaching staff, Doctoral Researcher
Equipment
-
HTB Helsinki Term Bank for the Arts and Sciences
Onikki-Rantajääskö, T. (Manager), Kanner, A. O. (Operator), Laxström, N. M. (Operator), Enqvist, E. J. (Other) & Kettunen, H. (Other)
Department of Finnish, Finno-Ugrian and Scandinavian StudiesFacility/equipment: Database
-
nVidia GTX Titan X GPU Workstation
Yli-Jyrä, A. (Manager)
Language TechnologyFacility/equipment: Equipment
-
nVidia RTX 2080Ti GPU for a Workstation
Yli-Jyrä, A. (Manager)
Language TechnologyFacility/equipment: Equipment
-
ReBeL: Reading Between the Lines: Exploiting context to discover the unsaid
Heikkilä, T. (Co-Principal Investigator), Tiedemann, J. (Principal Investigator), Tolonen, M. (Co-Principal Investigator), Roos, T. (Co-Principal Investigator), Jokitalo, E. (Co-Principal Investigator), Day, J. (Participant), Vikman, K. A. (Participant), Belevich, I. (Participant), Pivovarova, L. (Participant), Mathioudakis, M. (Co-Principal Investigator), Siewert, J. (Participant) & Tiihonen, I. L. I. (Participant)
01/01/2026 → 31/12/2028
Project: University of Helsinki Funds
-
Automatic Classification and Analysis of Texts from Egyptian Antiquity
Jauhiainen, T. (Project manager), Henriksson, E. (Participant), Jauhiainen, H. (Participant) & Vierros, M. (Participant)
01/01/2024 → 30/11/2029
Project: Foundations (Private Foundations, Non-Profit Foundations, Charitable Trusts)
-
High Performance Language Technologies
Tiedemann, J. (Project manager), Aulamo, M. (Participant), De Gibert Bonet, O. (Participant), Grönroos, S.-A. (Participant), Ji, S. (Participant), Li, Z. (Participant), Mickus, T. (Participant), Siewert, J. (Participant), Vahtola, T. (Participant), Vazquez , R. (Participant) & Virpioja, S. P. (Participant)
Charles University in Prague Faculty of Science Department of Teaching and Didactics of Biology
01/09/2022 → 31/01/2026
Project: EU Innovation actions (IA)
-
MaReTE: Machine Readable Texts for Egyptologists
Jauhiainen, H. (Project manager)
01/01/2021 → …
Project: Research project
-
Experimental Treebanking for the Minority Skolt Sámi Language and Finite-State Descriptions
Rueter, J. (Project manager), Juutinen, M. (Participant), Pirinen, T. (Project manager) & Tyers, F. (Participant)
01/06/2020 → …
Project: Research project
-
A Bayesian Approach to Inferring Prerequisite Structures and Topic Difficulty in Language Learning
Vu Anh, D., Hou, J., Katinskaya, A., Sheu, C.-F. & Yangarber, R., 2025, Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). The Association for Computational Linguistics, p. 737 15 p.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Open AccessFile -
Adapting Definition Modeling for New Languages: A Case Study on Belarusian
Kazakouskaya, D., Mickus, T. & Siewert, J., 1 Jul 2025, Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025). Piskorski, J., Přibáň, P., Nakov, P., Yangarber, R. & Marcinczuk, M. (eds.). Vienna: The Association for Computational Linguistics, p. 69-75 7 p.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Open AccessFile -
Aligning Encoded Hieroglyphic and Transliterated Words with Needleman-Wunsch Algorithm
Jauhiainen, H., 2025, (Accepted/In press) Proceedings of the Conference "Ancient Egypt-New Technologies" 2. (Serie Egittologica).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
-
Analyzing the Effect of Linguistic Instructions on Paraphrase Generation
Vahtola, T., Hu, S., Creutz, M., Vulić, I., Korhonen, A. & Tiedemann, J., Mar 2025, Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025). Johansson, R. & Stymme, S. (eds.). Tartu: University of Tartu Library, p. 755-766 12 p. (NEALT proceedings series; no. 57).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Open AccessFile -
ANEE Idiolect Network Portal
Jauhiainen, T. & Jauhiainen, H., 13 Nov 2025.Research output: Conference materials › Poster › peer-review
Open AccessFile
Activities
-
COGSCI 2026 (Event)
Jauhiainen, T. (Reviewer)
2026 → …Activity: Publication peer-review and editorial work types › Peer review of manuscripts
-
Language Resources and Evaluation Conference
Jauhiainen, T. (Scientific Committee Chair)
2025 → 2026Activity: Participating in or organising an event types › Organisation and participation in conferences, workshops, courses, seminars
-
Conference on Empirical Methods in Natural Language Processing
Vazquez , R. (Attendee)
4 Nov 2025 → 9 Nov 2025Activity: Participating in or organising an event types › Organisation and participation in conferences, workshops, courses, seminars
-
Neural Models for Lemmatization and POS-Tagging of Earlier and Late Egyptian (Supporting Hieroglyphic Input) and Demotic
Sahala, A. (Speaker)
4 May 2025Activity: Talk or presentation types › Oral presentation
-
Formulaic Language in Historical Linguistics conference
Korkiakangas, T. (Scientific Committee Chair), Vierros, M. (Scientific Committee Member), Jauhiainen, T. (Scientific Committee Member), Kopaczyk, J. (Scientific Committee Member), Bentein, K. (Scientific Committee Member) & Fantoli, M. (Scientific Committee Member)
2 Jun 2025 → 3 Jun 2025Activity: Participating in or organising an event types › Organisation and participation in conferences, workshops, courses, seminars
Prizes
-
August Ahlqvistin, Yrjö Wichmannin, Kai Donnerin ja Artturi Kanniston rahastojen väitöskirjapalkinto
Kuparinen, O. V. (Recipient), 14 Mar 2022
Prize: Prizes and awards
-
Best paper award at DHN 2020
Mäkelä, E. (Recipient), Lagus, K. (Recipient), Lahti, L. (Recipient), Säily, T. (Recipient), Tolonen, M. (Recipient), Hämäläinen, M. (Recipient), Kaislaniemi, S. (Recipient) & Nevalainen, T. (Recipient), 23 Oct 2020
Prize: Prizes and awards
-
-
-
Datasets
-
Mu-SHROOM: Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes and Related Observable Overgeneration Mistakes
Vazquez , R. (Creator) & Mickus, T. (Creator), ACL, Jul 2025
https://huggingface.co/datasets/Helsinki-NLP/mu-shroom and 2 more links, https://github.com/Helsinki-NLP/mu-shroom, https://helsinki-nlp.github.io/shroom/2025.html (show fewer)
Dataset
-
The SHROOM dataset for Multilingual Hallucination and Overgeneration detection.
Mickus, T. (Creator) & Vazquez , R. (Creator), ACL, 2024
https://github.com/Helsinki-NLP/shroom/blob/main/2024.md
Dataset
-
Murreviikko: an Annotated and Normalized Corpus of Dialectal Finnish Tweets
Kuparinen, O. V. (Creator), Zenodo, 2023
Dataset
-
OcWikiAnnot: Annotated Wikipedia Corpus of Occitan
Miletic Haddad, A. (Creator), Zenodo, 20 Apr 2023
DOI: 10.5281/zenodo.7777340, https://doi.org/10.5281/zenodo.7777340
Dataset
-
OcWikiDisc: a Corpus of Wikipedia Talk Pages in Occitan
Miletic Haddad, A. (Creator) & Scherrer, Y. (Creator), Zenodo, 14 Sept 2022
DOI: 10.5281/zenodo.7079580, https://doi.org/10.5281/zenodo.7079580
Dataset
Press/Media
-
-
Språk(teknologi) är nyckeln till intelligens och rättvisa
20/01/2022
1 Media contribution
Press/Media: Press / Media
-
芬兰研究人员正在教人工智能讲流利的芬兰语方言
Hämäläinen, M., Alnajjar, K., Rueter, J. & Partanen, N.
10/01/2022
1 item of Media coverage
Press/Media: Press / Media
-
Inteligência artificial identifica 23 dialetos em finlandês
Hämäläinen, M., Alnajjar, K., Rueter, J. & Partanen, N.
17/12/2021
1 item of Media coverage
Press/Media: Press / Media
-
Researchers teach artificial intelligence to be fluent in Finnish dialects
Hämäläinen, M., Alnajjar, K., Partanen, N. & Rueter, J.
16/12/2021
1 Media contribution
Press/Media: Press / Media