METHODS: A gender-matched case-control study was conducted in the largest public sector cardiac hospital of Pakistan, and the data of 460 subjects were collected. The dataset comprised of eight nonclinical features. Four supervised ML algorithms were used to train and test the models to predict the CVDs status by considering traditional logistic regression (LR) as the baseline model. The models were validated through the train-test split (70:30) and tenfold cross-validation approaches.
RESULTS: Random forest (RF), a nonlinear ML algorithm, performed better than other ML algorithms and LR. The area under the curve (AUC) of RF was 0.851 and 0.853 in the train-test split and tenfold cross-validation approach, respectively. The nonclinical features yielded an admissible accuracy (minimum 71%) through the LR and ML models, exhibiting its predictive capability in risk estimation.
CONCLUSION: The satisfactory performance of nonclinical features reveals that these features and flexible computational methodologies can reinforce the existing risk prediction models for better healthcare services.