Discriminating similar languages with token-based backoff

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

In this paper we describe the language identification system built within the Finno-Ugric Languages and the Internet project for the Discriminating between Similar Languages (DSL) shared task in LT4VarDial workshop at RANLP-2015. The system reached fourth place in normal closed submissions (94.7% accuracy) and second place in closed submissions with the named entities blinded (93.0% accuracy).
Original languageEnglish
Title of host publicationProceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects
Number of pages8
PublisherAssociation for Computational Linguistics
Publication date2015
Pages44-51
ISBN (Print)978-954-452-031-1
Publication statusPublished - 2015
MoE publication typeA4 Article in conference proceedings
EventJoint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects - Hissar, Bulgaria
Duration: 10 Sep 201510 Sep 2015
Conference number: LT4CloseLang 2 - VarDial 2

Fields of Science

  • 113 Computer and information sciences
  • 6121 Languages

Cite this

Jauhiainen, T., Jauhiainen, H., & Linden, K. (2015). Discriminating similar languages with token-based backoff. In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (pp. 44-51). Association for Computational Linguistics.
Jauhiainen, Tommi ; Jauhiainen, Heidi ; Linden, Krister. / Discriminating similar languages with token-based backoff. Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects. Association for Computational Linguistics, 2015. pp. 44-51
@inproceedings{d95530bd1810444eaf8df5f9498e4577,
title = "Discriminating similar languages with token-based backoff",
abstract = "In this paper we describe the language identification system built within the Finno-Ugric Languages and the Internet project for the Discriminating between Similar Languages (DSL) shared task in LT4VarDial workshop at RANLP-2015. The system reached fourth place in normal closed submissions (94.7{\%} accuracy) and second place in closed submissions with the named entities blinded (93.0{\%} accuracy).",
keywords = "113 Computer and information sciences, 6121 Languages",
author = "Tommi Jauhiainen and Heidi Jauhiainen and Krister Linden",
note = "Volume: Proceeding volume:",
year = "2015",
language = "English",
isbn = "978-954-452-031-1",
pages = "44--51",
booktitle = "Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects",
publisher = "Association for Computational Linguistics",
address = "International",

}

Jauhiainen, T, Jauhiainen, H & Linden, K 2015, Discriminating similar languages with token-based backoff. in Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects. Association for Computational Linguistics, pp. 44-51, Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, Hissar, Bulgaria, 10/09/2015.

Discriminating similar languages with token-based backoff. / Jauhiainen, Tommi; Jauhiainen, Heidi; Linden, Krister.

Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects. Association for Computational Linguistics, 2015. p. 44-51.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Discriminating similar languages with token-based backoff

AU - Jauhiainen, Tommi

AU - Jauhiainen, Heidi

AU - Linden, Krister

N1 - Volume: Proceeding volume:

PY - 2015

Y1 - 2015

N2 - In this paper we describe the language identification system built within the Finno-Ugric Languages and the Internet project for the Discriminating between Similar Languages (DSL) shared task in LT4VarDial workshop at RANLP-2015. The system reached fourth place in normal closed submissions (94.7% accuracy) and second place in closed submissions with the named entities blinded (93.0% accuracy).

AB - In this paper we describe the language identification system built within the Finno-Ugric Languages and the Internet project for the Discriminating between Similar Languages (DSL) shared task in LT4VarDial workshop at RANLP-2015. The system reached fourth place in normal closed submissions (94.7% accuracy) and second place in closed submissions with the named entities blinded (93.0% accuracy).

KW - 113 Computer and information sciences

KW - 6121 Languages

M3 - Conference contribution

SN - 978-954-452-031-1

SP - 44

EP - 51

BT - Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

PB - Association for Computational Linguistics

ER -

Jauhiainen T, Jauhiainen H, Linden K. Discriminating similar languages with token-based backoff. In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects. Association for Computational Linguistics. 2015. p. 44-51