Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition

Wei Sun, Shaoxiong Ji, Tuulia Denti, Hans Moen, Oleg Kerro, Antti Rannikko, Pekka Marttinen, Miika Koskinen

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

One of the central tasks of medical text analysis is to extract and structure meaningful information from plain-text clinical documents. Named Entity Recognition (NER) is a sub-task of information extraction that involves identifying predefined entities from unstructured free text. Notably, NER models require large amounts of human-labeled data to train, but human annotation is costly and laborious and often requires medical training. Here, we aim to overcome the shortage of manually annotated data by introducing a training scheme for NER models that uses an existing medical ontology to assign weak labels to entities and provides enhanced domain-specific model adaptation with in-domain continual pretraining. Due to limited human annotation resources, we develop a specific module to collect a more representative test dataset from the data lake than a random selection. To validate our framework, we invite clinicians to annotate the test set. In this way, we construct two Finnish medical NER datasets based on clinical records retrieved from a hospital’s data lake and evaluate the effectiveness of the proposed methods. The code is available at ttps://github.com/VRCMF/HAM-net.git.
Alkuperäiskielienglanti
OtsikkoMachine Learning and Knowledge Discovery in Databases : Applied Data Science and Demo Track. ECML PKDD 2023
ToimittajatGianmarco De Francisci Morales, Claudia Perlich, Natali Ruchansky, Nicolas Kourtellis, Elena Baralis, Francesco Bonchi
Sivumäärä16
JulkaisupaikkaCham
KustantajaSpringer Nature Switzerland
Julkaisupäivä2023
Sivut444-459
ISBN (painettu)978-3-031-43426-6
ISBN (elektroninen)978-3-031-43427-3
DOI - pysyväislinkit
TilaJulkaistu - 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Turin, Italia
Kesto: 18 syysk. 202322 syysk. 2023
https://2023.ecmlpkdd.org/

Julkaisusarja

NimiLecture Notes in Artificial Intelligence
KustantajaSpringer Nature
Vuosikerta14174
ISSN (painettu)0302-9743
ISSN (elektroninen)1611-3349

Tieteenalat

  • 3121 Yleislääketiede, sisätaudit ja muut kliiniset lääketieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä