Advertisement
Canadian Journal of Cardiology

Deep Phenotyping and Prediction of Long-term Cardiovascular Disease: Optimized by Machine Learning

  • Author Footnotes
    ‡ These authors contributed equally to the study.
    Xiao-dong Zhuang
    Footnotes
    ‡ These authors contributed equally to the study.
    Affiliations
    Cardiology Department, the First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

    NHC Key Laboratory of Assisted Circulation (Sun Yat-sen University), Guangzhou, China
    Search for articles by this author
  • Author Footnotes
    ‡ These authors contributed equally to the study.
    Ting Tian
    Footnotes
    ‡ These authors contributed equally to the study.
    Affiliations
    Department of Statistical Science, School of Mathematics, Southern China Center for Statistical Science, Sun Yat-sen University, Guangzhou, China
    Search for articles by this author
  • Author Footnotes
    ‡ These authors contributed equally to the study.
    Li-zhen Liao
    Footnotes
    ‡ These authors contributed equally to the study.
    Affiliations
    Department of Health, Guangdong Pharmaceutical University, Guangzhou Higher Education Mega Center, Guangzhou, China
    Search for articles by this author
  • Yue-hua Dong
    Affiliations
    Department of Statistical Science, School of Mathematics, Southern China Center for Statistical Science, Sun Yat-sen University, Guangzhou, China
    Search for articles by this author
  • Hao-jin Zhou
    Affiliations
    Department of Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China
    Search for articles by this author
  • Shao-zhao Zhang
    Affiliations
    Cardiology Department, the First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

    NHC Key Laboratory of Assisted Circulation (Sun Yat-sen University), Guangzhou, China
    Search for articles by this author
  • Wen-yi Chen
    Affiliations
    Department of Statistical Science, School of Mathematics, Southern China Center for Statistical Science, Sun Yat-sen University, Guangzhou, China
    Search for articles by this author
  • Zhi-min Du
    Affiliations
    Cardiology Department, the First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

    NHC Key Laboratory of Assisted Circulation (Sun Yat-sen University), Guangzhou, China
    Search for articles by this author
  • Xue-qin Wang
    Correspondence
    Dr Xue-qin Wang, Department of Statistical Science, School of Mathematics, Southern China Center for Statistical Science, Sun Yat-sen University, 135, Xingang West Road, Guangzhou 510275, China.
    Affiliations
    Department of Statistical Science, School of Mathematics, Southern China Center for Statistical Science, Sun Yat-sen University, Guangzhou, China

    Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China

    Xinhua College, Sun Yat-sen University, Guangzhou, China
    Search for articles by this author
  • Xin-xue Liao
    Correspondence
    Corresponding authors: Dr Xin-xue Liao, Cardiology Department, the First Affiliated Hospital of Sun Yat-sen University, 58 Zhongshan 2nd Road, Guangzhou 510080, China.
    Affiliations
    Cardiology Department, the First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

    NHC Key Laboratory of Assisted Circulation (Sun Yat-sen University), Guangzhou, China
    Search for articles by this author
  • Author Footnotes
    ‡ These authors contributed equally to the study.
Published:February 11, 2022DOI:https://doi.org/10.1016/j.cjca.2022.02.008

      Abstract

      Background

      Prediction of cardiovascular disease (CVD) is important in clinical practice. Machine learning (ML) may offer an improved alternative to current CVD risk stratification in individual patients. We aim to identify important predictors and compare ML models with traditional models according to their prediction performance in a large long-term follow-up cohort.

      Methods

      The Atherosclerosis Risk in Communities (ARIC) study was designed to study the progression of subclinical disease to cardiovascular events over a 25-year follow-up period. All phenotypic variables at visit 1 were obtained. All-cause death, CVD, and coronary heart disease were the outcomes for analysis. The ML framework involved variable selection using the random survival forest (RSF) method, model building, and 5-fold cross-validation. Model performance was evaluated by discrimination using the Harrell concordance index (C-index), accuracy using the Brier score (BS), and interpretability using the number of variables in the model.

      Results

      Of the 14,842 participants in ARIC, the average age was 54.2 years, with 45.2% male and 26.2% Black participants. Thirty-eight unique variables were selected in the RSF top 20 importance ranking of all 6 outcomes. Aging, hypertension, glucose metabolism, renal function, coagulation, adiposity, and sodium retention dominated the predictions of all outcomes. The ML models outperformed the regression models and established risk scores with a higher C-index, lower BS, and varied interpretability.

      Conclusions

      The ML framework is useful for identifying important predictors of CVD and for developing models with robust performance compared with existing risk models.

      Résumé

      Contexte

      La prédiction des maladies cardiovasculaires (MCV) est importante dans la pratique clinique. L'apprentissage automatique (AA) peut offrir une alternative améliorée à la stratification actuelle du risque de MCV chez les patients individuels. Notre objectif est d'identifier les prédicteurs importants et de comparer les modèles d'AA aux modèles traditionnels en fonction de leur performance de prédiction dans une grande cohorte de suivi à long terme.

      Méthodes

      L'étude ARIC (Atherosclerosis Risk in Communities) a été conçue pour étudier la progression de la maladie sous forme subclinique vers des complications cardiovasculaires sur une période de suivi de 25 ans. Toutes les variables phénotypiques de la première visite ont été obtenues. Le décès toutes causes confondues, les MCV et les maladies coronariennes constituaient les résultats de l'analyse. Le cadre d'AA a impliqué une sélection de variables en utilisant la méthode de la forêt aléatoire en survie (FAS), une construction de modèles et une validation croisée répétée cinq fois. La performance du modèle a été évaluée par l'étude de la discrimination en utilisant un indice de probabilité de concordance (C-index) selon la procédure de Harrell, de la précision en utilisant le score de Brier (SB) et de l'interprétabilité en utilisant le nombre de variables dans le modèle.

      Résultats

      Sur les 14 842 participants à l'étude ARIC, l'âge moyen était de 54,2 ans, avec 45,2 % d'hommes et 26,2 % de Noirs. Trente-huit variables uniques ont été sélectionnées parmi le classement des 20 variables les plus importantes du FAS, pour les six résultats. Le vieillissement, l'hypertension, le métabolisme du glucose, la fonction rénale, la coagulation, l'adiposité et la rétention de sodium ont dominé les prédictions de tous les résultats. Les modèles d'AA ont surpassé les modèles de régression et ont établi des scores de risque avec un C-index plus élevé, un SB plus faible et une interprétabilité variée.

      Conclusions

      L'environnement d'AA est utile pour identifier les prédicteurs importants des MCV et pour développer des modèles avec une performance robuste par rapport aux modèles de risque existants.
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Canadian Journal of Cardiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Woodward M.
        • Tunstall-Pedoe H.
        • Peters S.A.
        Graphics and statistics for cardiology: clinical prediction rules.
        Heart. 2017; 103: 538-545
        • McEvoy J.W.
        • Diamond G.A.
        • Detrano R.C.
        • et al.
        Risk and the physics of clinical prediction.
        Am J Cardiol. 2014; 113: 1429-1435
        • Blecker S.
        • Katz S.D.
        • Horwitz L.I.
        • et al.
        Comparison of approaches for heart failure case identification from electronic health record data.
        JAMA Cardiol. 2016; 1: 1014-1020
        • Goldstein B.A.
        • Navar A.M.
        • Carter R.E.
        Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.
        Eur Heart J. 2017; 38: 1805-1814
        • Leopold J.A.
        • Loscalzo J.
        Emerging role of precision medicine in cardiovascular disease.
        Circ Res. 2018; 122: 1302-1315
        • Mandl K.D.
        • Manrai A.K.
        Potential excessive testing at scale: biomarkers, genomics, and machine learning.
        JAMA. 2019; 321: 739-740
        • Goldstein B.A.
        • Carlson D.
        • Bhavsar N.A.
        Subject matter knowledge in the age of big data and machine learning.
        JAMA Netw Open. 2018; 1e181568
        • Diller G.P.
        • Kempny A.
        • Babu-Narayan S.V.
        • et al.
        Machine learning algorithms estimating prognosis and guiding therapy in adult congenital heart disease: data from a single tertiary centre including 10,019 patients.
        Eur Heart J. 2019; 40: 1069-1077
        • Samad M.D.
        • Ulloa A.
        • Wehner G.J.
        • et al.
        Predicting survival from large echocardiography and electronic health record datasets: optimization with machine learning.
        JACC Cardiovasc Imaging. 2019; 12: 681-689
        • Tsay D.
        • Patterson C.
        From machine learning to artificial intelligence applications in cardiac care.
        Circulation. 2018; 138: 2569-2575
        • Shameer K.
        • Johnson K.W.
        • Glicksberg B.S.
        • Dudley J.T.
        • Sengupta P.P.
        Machine learning in cardiovascular medicine: are we there yet?.
        Heart. 2018; 104: 1156-1164
        • Hu C.
        • Steingrimsson J.A.
        Personalized risk prediction in clinical oncology research: applications and practical issues using survival trees and random forests.
        J Biopharm Stat. 2018; 28: 333-349
        • Nwanosike E.M.
        • Conway B.R.
        • Merchant H.A.
        • Hasan S.S.
        Potential applications and performance of machine learning techniques and algorithms in clinical practice: a systematic review.
        Int J Med Inform. 2021; 159: 104679
        • Wright M.N.
        • Dankowski T.
        • Ziegler A.
        Unbiased split variable selection for random survival forests using maximally selected rank statistics.
        Stat Med. 2017; 36: 1272-1284
        • Wang H.
        • Li G.
        Extreme learning machine Cox model for high-dimensional survival analysis.
        Stat Med. 2019; 38: 2139-2156
        • Sharrett A.R.
        The Atherosclerosis Risk in Communities (ARIC) study: introduction and objectives of the hemostasis component.
        Ann Epidemiol. 1992; 2: 467-469
        • Nieto F.J.
        • Szklo M.
        • Folsom A.R.
        • Rock R.
        • Mercuri M.
        Leukocyte count correlates in middle-aged adults: the Atherosclerosis Risk in Communities (ARIC) Study.
        Am J Epidemiol. 1992; 136: 525-537
        • Norby F.L.
        • Soliman E.Z.
        • Chen L.Y.
        • et al.
        Trajectories of cardiovascular risk factors and incidence of atrial fibrillation over a 25-year follow-up: the ARIC study (Atherosclerosis Risk in Communities).
        Circulation. 2016; 134: 599-610
        • Nasejje J.B.
        • Mwambi H.
        • Dheda K.
        • Lesosky M.
        A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data.
        BMC Med Res Methodol. 2017; 17: 115
        • Ambale-Venkatesh B.
        • Yang X.
        • Wu C.O.
        • et al.
        Cardiovascular event prediction by machine learning: the multi-ethnic study of atherosclerosis.
        Circ Res. 2017; 121: 1092-1101
        • Park E.
        • Ha I.D.
        Penalized variable selection for accelerated failure time models with random effects.
        Stat Med. 2019; 38: 878-892
        • Liu J.
        • Wang K.
        • Ma S.
        • Huang J.
        Regularized regression method for genome-wide association studies.
        BMC Proc. 2011; 5: S67
        • Reimer A.P.
        • Madigan E.A.
        Veracity in big data: how good is good enough.
        Health Informatics J. 2019; 25: 1290-1298
        • Vistisen D.
        • Andersen G.S.
        • Hansen C.S.
        • et al.
        Prediction of first cardiovascular disease event in type 1 diabetes mellitus: the steno type 1 risk engine.
        Circulation. 2016; 133: 1058-1066
        • Henglin M.
        • Stein G.
        • Hushcha P.V.
        • Snoek J.
        • Wiltschko A.B.
        • Cheng S.
        Machine learning approaches in cardiovascular imaging.
        Circ Cardiovasc Imaging. 2017; 10e005614
        • Al'Aref S.J.
        • Anchouche K.
        • Singh G.
        • et al.
        Clinical applications of machine learning in cardiovascular disease and its relevance to cardiac imaging.
        Eur Heart J. 2019; 40: 1975-1986
        • Nagueh S.F.
        Unleashing the potential of machine-based learning for the diagnosis of cardiac diseases.
        Circ Cardiovasc Imaging. 2016; 9e005059