Abstract
One of the central tasks of medical text analysis is to extract and structure meaningful information from plain-text clinical documents. Named Entity Recognition (NER) is a sub-task of information extraction that involves identifying predefined entities from unstructured free text. Notably, NER models require large amounts of human-labeled data to train, but human annotation is costly and laborious and often requires medical training. Here, we aim to overcome the shortage of manually annotated data by introducing a training scheme for NER models that uses an existing medical ontology to assign weak labels to entities and provides enhanced domain-specific model adaptation with in-domain continual pretraining. Due to limited human annotation resources, we develop a specific module to collect a more representative test dataset from the data lake than a random selection. To validate our framework, we invite clinicians to annotate the test set. In this way, we construct two Finnish medical NER datasets based on clinical records retrieved from a hospital’s data lake and evaluate the effectiveness of the proposed methods. The code is available at ttps://github.com/VRCMF/HAM-net.git.
Original language | English |
---|---|
Title of host publication | Machine Learning and Knowledge Discovery in Databases : Applied Data Science and Demo Track. ECML PKDD 2023 |
Editors | Gianmarco De Francisci Morales, Claudia Perlich, Natali Ruchansky, Nicolas Kourtellis, Elena Baralis, Francesco Bonchi |
Number of pages | 16 |
Place of Publication | Cham |
Publisher | Springer Nature Switzerland |
Publication date | 2023 |
Pages | 444-459 |
ISBN (Print) | 978-3-031-43426-6 |
ISBN (Electronic) | 978-3-031-43427-3 |
DOIs | |
Publication status | Published - 2023 |
MoE publication type | A4 Article in conference proceedings |
Event | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Turin, Italy Duration: 18 Sept 2023 → 22 Sept 2023 https://2023.ecmlpkdd.org/ |
Publication series
Name | Lecture Notes in Artificial Intelligence |
---|---|
Publisher | Springer Nature |
Volume | 14174 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Fields of Science
- 3121 General medicine, internal medicine and other clinical medicine
- 113 Computer and information sciences