A pipeline to further enhance quality, integrity and reusability of the NCCID clinical data

Anna Breger, Ian Selby, Michael Roberts, Judith Babar, Effrossyni Gkrania-Klotsas, Jacobus Preller, Lorena Escudero Sánchez, Sören Dittmer, Matthew Thorpe, Julian Gilbey, Emily Jefferson, Georg Langs, Guang Yang, Xiaodan Xing, Yang Nan, Ming Li, Helmut Prosch, Jan Stanczuk, Jing Tang, Philip TeareMishal Patel, Marcel Wassink, Markus Holzer, Eduardo González Solares, Nicholas Walton, Pietro Liò, Tolou Shadbahr, James H. F. Rudd, John A. D. Aston, Jonathan R. Weir-McCall, Evis Sala, Carola-Bibiane Schönlieb, Anna Korhonen, AIX-COVNET Collaboration

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Abstrakti

The National COVID-19 Chest Imaging Database (NCCID) is a centralized UK database of thoracic imaging and corresponding clinical data. It is made available by the National Health Service Artificial Intelligence (NHS AI) Lab to support the development of machine learning tools focused on Coronavirus Disease 2019 (COVID-19). A bespoke cleaning pipeline for NCCID, developed by the NHSx, was introduced in 2021. We present an extension to the original cleaning pipeline for the clinical data of the database. It has been adjusted to correct additional systematic inconsistencies in the raw data such as patient sex, oxygen levels and date values. The most important changes will be discussed in this paper, whilst the code and further explanations are made publicly available on GitLab. The suggested cleaning will allow global users to work with more consistent data for the development of machine learning tools without being an expert. In addition, it highlights some of the challenges when working with clinical multi-center data and includes recommendations for similar future initiatives.
Alkuperäiskielienglanti
Artikkeli493
LehtiScientific data
Vuosikerta10
Numero1
Sivumäärä16
ISSN2052-4463
DOI - pysyväislinkit
TilaJulkaistu - 27 heinäk. 2023
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 111 Matematiikka

Siteeraa tätä