Abstract
In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available.
Original language | English |
---|---|
Title of host publication | Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa) |
Number of pages | 10 |
Publisher | University of Tartu Library |
Publication date | May 2023 |
Pages | 159-168 |
ISBN (Electronic) | 978-9916-21-999-7 |
Publication status | Published - May 2023 |
MoE publication type | A4 Article in conference proceedings |
Event | Nordic Conference on Computational Linguistics - Tórshavn, Faroe Islands Duration: 22 May 2023 → 24 May 2023 Conference number: 24 |
Publication series
Name | NEALT Proceedings Series |
---|---|
Publisher | University of Tartu Library |
Number | 52 |
ISSN (Electronic) | 1736-6305 |
Fields of Science
- 113 Computer and information sciences
- 6121 Languages