Introduction: Exhaled-breath analysis of volatile organic compounds (VOCs) has shown the potential to detect lung cancer Reproducibility of prediction models, especially based on artificial intelligence (Al) algorithms is essential However, external validation is often lacking as this is time-consuming and meanwhile improved Al models often outperform the 'older' model, based on a training set We aim to simultaneously validate and improve a training model to distinguish non-small cell lung cancer (NSCLC) patients from healthy controls based on Al algorithms Methods: We obtained exhaled-breath data of & gt; 800 subjects This new cohort will be used to externally validate our original prediction model to distinguish between NSCLC patients and healthy controls (N=290, AUC-ROC 0 76) In a step-wise design, a set of 50 subjects will be first predicted by the original model, whereupon these data are added to the unblinded data, and a new prediction model will be created based on an increased sample size This will be repeated 6 times The remaining 500 subjects will be used to validate the final extended model Performance will be assessed by Area under the Curve
Results: Despite finishing the inclusions, due to the COVID pandemic, we have not yet been able to validate data of all included subjects in all 7 centres This will be done before September 2020 Conclusion: We propose a design to simultaneously externally validate an original prediction model based on exhaled-breath data to distinguish NSCLC patients from healthy controls and develop new prediction models based on improved Al techniques