Comparison of machine learning methods in the early identification of vasculitides, myositides and glomerulonephritides

Rasmus Ryyppö, Sergei Häyrynen, Henry Joutsijoki, Martti Juhola, Mikko Seppänen

Forskningsoutput: TidskriftsbidragArtikelVetenskapligPeer review

Sammanfattning

Background: Rare disease diagnoses are often delayed by years, including multiple doctor visits, and potential imprecise or incorrect diagnoses before receiving the correct one. Machine learning could solve this problem by flagging potential patients that doctors should examine more closely. Methods: Making the prediction situation as close as possible to real situation, we tested different masking sizes. In the masking phase, data was removed, and it was applied to all data points following the first rare disease diagnosis, including the day when the diagnosis was received, and in addition applied to selected number of days before initial diagnosis. Performance of machine learning models were compared with positive predictive value (PPV), negative predictive value (NPV), prevalence PPV (pPPV), prevalence NPV (pNPV), accuracy (ACC) and area under the receiver operation characteristics curve (AUC). Results: XGBoost had PPVs over 90 % in all masking settings, and InceptionVasGloMyotides had most of the PPVs over 90 %, but not as consistently. When the prevalence of the diseases was considered XGBoost achieved highest value of 8.8 % in binary classification with 30 days masking and InceptionVasGloMyotides achieved the best value of 6 % in the binary classification as well, but with 2160 days and 4320 days masking. ACC were varying between 89 % and 98 % with XGBoost and InceptionVasGloMyotides having variation between 79 % and 94 %. AUC on the other hand varied between 72.6 % and 94.5 % with InceptionVasGloMyotides and for XGBoost it varied between 69.9 % and 96.4 %. Conclusions: XGBoost and InceptionVasGloMyotides could successfully predict rare diseases for patients at least 30 days prior to initial rare disease diagnose. In addition, we managed to build performative custom deep learning model.

Originalspråkengelska
Artikelnummer107917
TidskriftComputer Methods and Programs in Biomedicine
Volym243
Antal sidor7
ISSN0169-2607
DOI
StatusPublicerad - jan. 2024
MoE-publikationstypA1 Tidskriftsartikel-refererad

Bibliografisk information

Publisher Copyright:
© 2023 The Author(s)

Vetenskapsgrenar

  • 3111 Biomedicinska vetenskaper

Citera det här