Project Details


Access to the internet is no longer a luxury---it is a basic component of everyday life and civic engagement, but one in which language continues to be a challenge for fair and equitable access. As Europe becomes more multicultural, and personal and professional mobility between cultures rapidly increases, access to fundamental resources such as local news and government services is limited by the great diversity of the EU's 37 languages. The internet mostly developed in English, and without clear planning for how language issues might form barriers to access and engagement, nor how multilingualism might be supported. In the EU, websites and online services for citizens have developed national local language resources, and often only provide a second language (usually English) when absolutely needed; but the great proliferation of web content, multiple and fast-changing content streams, and an expanding user interest base make this approach untenable. And while advanced natural language research and resources exist for a few dominant languages (English, French, German), many of Europe's smaller language communities---and the news media industry that serves them---lack appropriate tools for multilingual internet development. For the EU to realise a truly equitable, open, multilingual future internet, new tools allowing high quality transformations (not translations) between languages are urgently needed. The EMBEDDIA project seeks to address these challenges by leveraging innovations in the use of cross-lingual embeddings coupled with deep neural networks to allow existing monolingual resources to be used across languages, leveraging their high speed of operation for near real-time applications, without the need for large computational resources. Across three years, the project's six academic and four industry partners will develop novel solutions including for under-represented languages, and test them in real-world news and media production contexts.
Short titleEmbeddia
Effective start/end date01/01/201931/12/2021

Fields of Science

  • 113 Computer and information sciences
  • Artificial Inteligence
  • Data Science
  • 518 Media and communications
  • 6121 Languages
  • Natural Language Processing
  • Natural Language Generation

Research Output

  • 4 Conference contribution
  • 1 Article

Capturing Evolution in Word Usage: Just Add More Clusters?

Martinc, M., Montariol, S., Zosa, E. & Pivovarova, L., 2020, WWW ’20 Companion. Taipei

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


Computational Generation of Slogans

Alnajjar, K. & Toivonen, H., 2020, (Accepted/In press) In : Natural Language Engineering.

Research output: Contribution to journalArticleScientificpeer-review

Multilingual Dynamic Topic Model

Zosa, E. & Granroth-Wilding, M., 4 Sep 2019, RANLP 2019 - Natural Language Processing a Deep Learning World: Proceedings. Angelova, G., Mitkov, R., Nikolova, I. & Temnikova, I. (eds.). Shoumen: INCOMA, p. 1388-1396 9 p. (International conference Recent advances in natural language processing).

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Open Access


  • 1 Organisation and participation in conferences, workshops, courses, seminars
  • 1 Oral presentation

No Time Like the Present: Methods for Generating Colourful and Factual Multilingual News Headlines

Khalid Alnajjar (Speaker)
21 Jun 2019

Activity: Talk or presentation typesOral presentation

Practices and perspectives: News automation at work

Leo Leppänen (Member of organizing committee), Leo Leppänen (Speaker: Presenter)
4 Nov 20195 Nov 2019

Activity: Participating in or organising an event typesOrganisation and participation in conferences, workshops, courses, seminars