Abstrakti
One of the central tasks of medical text analysis is to extract and structure meaningful information from plain-text clinical documents. Named Entity Recognition (NER) is a sub-task of information extraction that involves identifying predefined entities from unstructured free text. Notably, NER models require large amounts of human-labeled data to train, but human annotation is costly and laborious and often requires medical training. Here, we aim to overcome the shortage of manually annotated data by introducing a training scheme for NER models that uses an existing medical ontology to assign weak labels to entities and provides enhanced domain-specific model adaptation with in-domain continual pretraining. Due to limited human annotation resources, we develop a specific module to collect a more representative test dataset from the data lake than a random selection. To validate our framework, we invite clinicians to annotate the test set. In this way, we construct two Finnish medical NER datasets based on clinical records retrieved from a hospital’s data lake and evaluate the effectiveness of the proposed methods. The code is available at ttps://github.com/VRCMF/HAM-net.git.
Alkuperäiskieli | englanti |
---|---|
Otsikko | Machine Learning and Knowledge Discovery in Databases : Applied Data Science and Demo Track. ECML PKDD 2023 |
Toimittajat | Gianmarco De Francisci Morales, Claudia Perlich, Natali Ruchansky, Nicolas Kourtellis, Elena Baralis, Francesco Bonchi |
Sivumäärä | 16 |
Julkaisupaikka | Cham |
Kustantaja | Springer Nature Switzerland |
Julkaisupäivä | 2023 |
Sivut | 444-459 |
ISBN (painettu) | 978-3-031-43426-6 |
ISBN (elektroninen) | 978-3-031-43427-3 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 2023 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisuussa |
Tapahtuma | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Turin, Italia Kesto: 18 syysk. 2023 → 22 syysk. 2023 https://2023.ecmlpkdd.org/ |
Julkaisusarja
Nimi | Lecture Notes in Artificial Intelligence |
---|---|
Kustantaja | Springer Nature |
Vuosikerta | 14174 |
ISSN (painettu) | 0302-9743 |
ISSN (elektroninen) | 1611-3349 |
Tieteenalat
- 3121 Yleislääketiede, sisätaudit ja muut kliiniset lääketieteet
- 113 Tietojenkäsittely- ja informaatiotieteet