Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

We approach the problem of recognition and attribution of quotes in Finnish news media. Solving this task would create possibilities for large-scale analysis of media wrt. the presence and styles of presentation of different voices and opinions. We describe the annotation of a corpus of media texts, numbering around 1500 articles, with quote attribution and coreference information. Further, we compare two methods for automatic quote recognition: a rule-based one operating on dependency trees and a machine learning one built on top of the BERT language model. We conclude that BERT provides more promising results even with little training data, achieving 95% F-score on direct quote recognition and 84% for indirect quotes. Finally, we discuss open problems and further associated tasks, especially the necessity of resolving speaker mentions to entity references.
Original languageEnglish
Title of host publicationProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
EditorsTanel Alumäe, Mark Fishel
Number of pages8
Place of PublicationTartu
PublisherUniversity of Tartu Library
Publication dateMay 2023
Pages52-59
ISBN (Electronic)978-99-1621-999-7
Publication statusPublished - May 2023
MoE publication typeA4 Article in conference proceedings
EventNordic Conference on Computational Linguistics - Tórshavn, Faroe Islands
Duration: 22 May 202324 May 2023
Conference number: 24

Publication series

NameNEALT Proceedings Series
PublisherUniversity of Tartu Library
Number52
ISSN (Print)1736-8197
ISSN (Electronic)1736-6305

Fields of Science

  • 6121 Languages
  • 113 Computer and information sciences

Cite this