Comparison of machine learning methods in the early identification of vasculitides, myositides and glomerulonephritides

Rasmus Ryyppö, Sergei Häyrynen, Henry Joutsijoki, Martti Juhola, Mikko Seppänen

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Background: Rare disease diagnoses are often delayed by years, including multiple doctor visits, and potential imprecise or incorrect diagnoses before receiving the correct one. Machine learning could solve this problem by flagging potential patients that doctors should examine more closely. Methods: Making the prediction situation as close as possible to real situation, we tested different masking sizes. In the masking phase, data was removed, and it was applied to all data points following the first rare disease diagnosis, including the day when the diagnosis was received, and in addition applied to selected number of days before initial diagnosis. Performance of machine learning models were compared with positive predictive value (PPV), negative predictive value (NPV), prevalence PPV (pPPV), prevalence NPV (pNPV), accuracy (ACC) and area under the receiver operation characteristics curve (AUC). Results: XGBoost had PPVs over 90 % in all masking settings, and InceptionVasGloMyotides had most of the PPVs over 90 %, but not as consistently. When the prevalence of the diseases was considered XGBoost achieved highest value of 8.8 % in binary classification with 30 days masking and InceptionVasGloMyotides achieved the best value of 6 % in the binary classification as well, but with 2160 days and 4320 days masking. ACC were varying between 89 % and 98 % with XGBoost and InceptionVasGloMyotides having variation between 79 % and 94 %. AUC on the other hand varied between 72.6 % and 94.5 % with InceptionVasGloMyotides and for XGBoost it varied between 69.9 % and 96.4 %. Conclusions: XGBoost and InceptionVasGloMyotides could successfully predict rare diseases for patients at least 30 days prior to initial rare disease diagnose. In addition, we managed to build performative custom deep learning model.

Original languageEnglish
Article number107917
JournalComputer Methods and Programs in Biomedicine
Volume243
Number of pages7
ISSN0169-2607
DOIs
Publication statusPublished - Jan 2024
MoE publication typeA1 Journal article-refereed

Bibliographical note

Publisher Copyright:
© 2023 The Author(s)

Fields of Science

  • Deep learning
  • Inception model
  • Machine learning
  • Rare diseases
  • Residual neural network (ResNet)
  • XGBoost
  • 3111 Biomedicine

Cite this