• PL 24 (Unioninkatu 40)

    00014

    Finland

20042024

Research activity per year

Personal profile

Description of research and teaching

Curriculum vitae

I work as professor of language technology at the Department of Digital Humanities at the University of Helsinki. My main research interest is in cross-lingual NLP and machine translation.

  • Since August 2015: Professor of Language Technology at the Department of Digital Humanities / HELDIG (formerly at the Department of Modern Languages), University of Helsinki
  • September 2014 – July 2015: Senior Researcher at the Department of Linguistics and Philology, Uppsala University
  • September 2009 – August 2014: Visiting Professor at the Department of Linguistics and Philology, Uppsala University
  • September 2004 – August 2009: PostDoc researcher at the Department of Information Science/Humanities Computing (Informatiekunde), University of Groningen
  • January 2004 – August 2004: Lecturer in computational linguistics and coordinator for the language technology programme, Department of Linguistics and Philology, Uppsala University
  • 2000 – 2003: Ph.D. research at the Department of Linguistics, Uppsala University
  • 2001 – 2002: Visiting Ph.D. student, Division of Informatics, Edinburgh University, UK
  • 1997 – 1999: Research assistent, Department of Linguistics, Uppsala University
  • 1991 – 1997: Masters in Computer Science (Diplom für Informatik), “Otto-von-Guericke” University, Magdeburg, Germany

Recent Projects

Resources and Tools

  • OPUS – a collection of freely available parallel corpora and tools
  • fiskmö translator – a translation demo for the Nordic languages
  • efmaral and eflomal – tools for efficient word alignment
  • WMT en-fi 20162017: official MT test sets for Finnish-English
  • HNMT – the Helsinki Neural Machine Translation system
  • Lingua::Align – a toolbox for tree-to-tree alignment
  • Uplug – a toolbox for processing parallel corpora
  • Lingua::Ident::Blacklists – language identifier for related languages
  • Docent – a document-level SMT decoder
  • pdf2xml – a converter for PDF documents
  • subalign – tools for converting and aligning movie subtitles
  • Helsinki-NLP at github and bitbucket

Active PhD Students

Former PhD Students

Education/Academic qualification

Computational Linguistics, PhD, Recycling Translations - Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing, Uppsala University

20002003

Award Date: 12 Dec 2003

Computer Science, M.Sc., Automatical Lexicon Extraction from Aligned Bilingual Corpora, Otto Von Guericke University, Magdeburg

19911997

Award Date: 11 Sept 1997

Fields of Science

  • 6121 Languages
  • Computational Linguistics
  • machine translation
  • 113 Computer and information sciences
  • language technology
  • machine learning
  • Natural language processing
  • Artificial intelligence

International and National Collaboration

Publications and projects within past five years.