LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

Marc Pàmies, Emily Öhman, Kaisla Kajava, Jörg Tiedemann

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

This paper presents the different models submitted by the LT@Helsinki team for the SemEval2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID dataset. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.
Original languageEnglish
Title of host publicationProceedings of the Fourteenth Workshop on Semantic Evaluation
EditorsAurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
Number of pages7
Place of PublicationBarcelona
PublisherInternational Committee for Computational Linguistics
Publication date2020
Pages1569-1575
ISBN (Electronic)978-1-952148-31-6
Publication statusPublished - 2020
MoE publication typeA4 Article in conference proceedings
EventInternational Workshop on Semantic Evaluation - [Online event], Barcelona, Spain
Duration: 12 Dec 202013 Dec 2020
Conference number: 14
http://alt.qcri.org/semeval2020/

Fields of Science

  • 113 Computer and information sciences
  • 6121 Languages

Cite this