Sammanfattning
One of the central tasks of medical text analysis is to extract and structure meaningful information from plain-text clinical documents. Named Entity Recognition (NER) is a sub-task of information extraction that involves identifying predefined entities from unstructured free text. Notably, NER models require large amounts of human-labeled data to train, but human annotation is costly and laborious and often requires medical training. Here, we aim to overcome the shortage of manually annotated data by introducing a training scheme for NER models that uses an existing medical ontology to assign weak labels to entities and provides enhanced domain-specific model adaptation with in-domain continual pretraining. Due to limited human annotation resources, we develop a specific module to collect a more representative test dataset from the data lake than a random selection. To validate our framework, we invite clinicians to annotate the test set. In this way, we construct two Finnish medical NER datasets based on clinical records retrieved from a hospital’s data lake and evaluate the effectiveness of the proposed methods. The code is available at ttps://github.com/VRCMF/HAM-net.git.
Originalspråk | engelska |
---|---|
Titel på värdpublikation | Machine Learning and Knowledge Discovery in Databases : Applied Data Science and Demo Track. ECML PKDD 2023 |
Redaktörer | Gianmarco De Francisci Morales, Claudia Perlich, Natali Ruchansky, Nicolas Kourtellis, Elena Baralis, Francesco Bonchi |
Antal sidor | 16 |
Utgivningsort | Cham |
Förlag | Springer Nature Switzerland |
Utgivningsdatum | 2023 |
Sidor | 444-459 |
ISBN (tryckt) | 978-3-031-43426-6 |
ISBN (elektroniskt) | 978-3-031-43427-3 |
DOI | |
Status | Publicerad - 2023 |
MoE-publikationstyp | A4 Artikel i en konferenspublikation |
Evenemang | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Turin, Italien Varaktighet: 18 sep. 2023 → 22 sep. 2023 https://2023.ecmlpkdd.org/ |
Publikationsserier
Namn | Lecture Notes in Artificial Intelligence |
---|---|
Förlag | Springer Nature |
Volym | 14174 |
ISSN (tryckt) | 0302-9743 |
ISSN (elektroniskt) | 1611-3349 |
Vetenskapsgrenar
- 3121 Allmänmedicin, inre medicin och annan klinisk medicin
- 113 Data- och informationsvetenskap