Canadian Journal of Cardiology
Clinical Research| Volume 37, ISSUE 11, P1708-1714, November 2021

Predicting Incident Heart Failure in Women With Machine Learning: The Women’s Health Initiative Cohort

Published:August 12, 2021DOI:



      Heart failure (HF) is a leading cause of cardiac morbidity among women, whose risk factors differ from those in men. We used machine-learning approaches to develop risk- prediction models for incident HF in a cohort of postmenopausal women from the Women’s Health Initiative (WHI).


      We used 2 machine-learning methods—Least Absolute Shrinkage and Selection Operator (LASSO) and Classification and Regression Trees (CART)—to perform variable selection on 1227 baseline WHI variables for the primary outcome of incident HF. These variables were then used to construct separate Cox proportional hazard models, and we compared these results, using receiver-operating characteristic (ROC) curve analysis, against a comparator model built using variables from the Atherosclerosis Risk in Communities (ARIC) HF prediction model. We analyzed 43,709 women who had 2222 incident HF events; median follow-up was 14.3 years.


      LASSO selected 10 predictors, and CART selected 11 predictors. The highest correlation between selected variables was 0.46. In addition to selecting well-established predictors such as age, myocardial infarction, and smoking, novel predictors included physical function, number of pregnancies, number of previous live births and age at menopause. In ROC analysis, the CART-derived model had the highest C-statistic of 0.83 (95% confidence interval [CI], 0.81-0.85), followed by LASSO 0.82 (95% CI, 0.81-0.84) and ARIC 0.73 (95% CI, 0.70-0.76).


      Machine-learning approaches can be used to develop HF risk-prediction models that can have better discrimination compared with an established HF risk model and may provide a basis for investigating novel HF predictors.



      L'insuffisance cardiaque (IC) est une cause majeure de morbidité cardiaque chez les femmes, dont les facteurs de risque diffèrent de ceux des hommes. Nous avons utilisé des approches d'apprentissage automatique pour développer des modèles de prédiction du risque d'insuffisance cardiaque dans une cohorte de femmes ménopausées de la Women's Health Initiative (WHI).


      Nous avons utilisé 2 méthodes d'apprentissage automatique—LASSO (Least Absolute Shrinkage and Selection Operator) et CART (Classification and Regression Trees)—pour effectuer une sélection de variables parmi les 1227 variables de base de la WHI pour le critère primaire d'une IC incidente. Ces variables ont ensuite été utilisées pour construire différents modèles à risque proportionnel de Cox, et nous avons comparé ces résultats, par une analyse de la fonction d'efficacité du récepteur (ROC), à un modèle de référence construit à partir de variables du modèle de prédiction de l'IC issues de l'ARIC (Atherosclerosis Risk in Communities). Nous avons analysé le cas de 43 709 femmes qui ont connus 2 222 épisodes d'IC; avec un suivi médian de 14,3 ans.


      La méthode LASSO a sélectionné 10 prédicteurs, et la méthode CART a sélectionné 11 prédicteurs. La corrélation la plus importante au sein des variables sélectionnées était de 0,46. En plus de la sélection de prédicteurs bien établis tels que l'âge, l'infarctus du myocarde et le tabagisme, de nouveaux prédicteurs comprenaient la capacité physique, le nombre de grossesses, le nombre de naissances antérieures viables et l'âge à la ménopause. Dans l'analyse de la courbe ROC, le modèle dérivé de CART présentait la statistique C la plus élevée de 0,83 (intervalle de confiance [IC] à 95 %, 0,81-0,85), suivi des modèles LASSO 0,82 (IC à 95 %, 0,81-0,84) et ARIC 0,73 (IC à 95 %, 0,70-0,76).


      Les approches d'apprentissage automatique peuvent être utilisées pour développer des modèles de prédiction du risque d'IC qui peuvent avoir une meilleure discrimination par rapport à un modèle établi du risque d'IC et peuvent fournir une base pour étudier de nouveaux prédicteurs d'IC.
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Canadian Journal of Cardiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Benjamin E.J.
        • Virani S.S.
        • Callaway C.W.
        • et al.
        Heart disease and stroke statistics–2018 update: a report from the American Heart Association.
        Circulation. 2018; 137: 67-492
        • Ouwerkerk W.
        • Voors A.A.
        • Zwinderman A.H.
        Factors influencing the predictive power of models for predicting mortality and/or heart failure hospitalization in patients with heart failure.
        J Am Coll Cardiol HF. 2014; 2: 429-436
        • Bibbins-Domingo K.
        Predictors of heart failure among women with coronary disease.
        Circulation. 2004; 110: 1424-1430
        • Garcia M.
        • Mulvagh S.L.
        • Merz C.N.B.
        • Buring J.E.
        • Manson J.E.
        Cardiovascular disease in women: clinical perspectives.
        Circ Res. 2016; 118: 1273-1293
        • Agarwal S.K.
        • Chambless L.E.
        • Ballantyne C.M.
        • et al.
        Prediction of incident heart failure in general practice: the Atherosclerosis Risk in Communities (ARIC) study.
        Circ Heart Fail. 2012; 5: 422-429
        • Kannel W.B.
        • D’Agostino R.B.
        • Silbershatz H.
        • Belanger A.J.
        • Wilson P.W.
        • Levy D.
        Profile for estimating risk of heart failure.
        Arch Intern Med. 1999; 159: 1197-1204
        • Butler J.
        • Kalogeropoulos A.
        • Georgiopoulou V.
        • et al.
        Incident heart failure prediction in the elderly: the health ABC heart failure score.
        Circ Heart Fail. 2008; 1: 125-133
        • Green M.
        • Björk J.
        • Forberg J.
        • Ekelund U.
        • Edenbrandt L.
        • Ohlsson M.
        Comparison between neural networks and multiple logistic regression to predict acute coronary syndrome in the emergency room.
        Artif Intell Med. 2006; 38: 305-318
        • Deo R.C.
        Machine learning in medicine.
        Circulation. 2015; 132: 1920-1930
        • Ambale-Venkatesh B.
        • Yang X.
        • Wu C.O.
        • et al.
        Cardiovascular event prediction by machine learning: the Multi-Ethnic Study of Atherosclerosis.
        Circ Res. 2017; 121: 1092-1101
        • Taslimitehrani V.
        • Dong G.
        • Pereira N.L.
        • Panahiazar M.
        • Pathak J.
        Developing EHR-driven heart failure risk prediction models using CPXR(Log) with the probabilistic loss function.
        J Biomed Inform. 2016; 60: 260-269
        • LeCun Y.
        • Bengio Y.
        • Hinton G.
        Deep learning.
        Nature. 2015; 521: 436-444
        • James G.
        • Witten D.
        • Hastie T.
        • Tibshirani R.
        An Introduction to Statistical Learning: With Applications in R.
        Springer Publishing Co Inc., New York, NY2014: 430
        • Segar M.W.
        • Jaeger B.C.
        • Patel K.V.
        • et al.
        Development and validation of machine learning-based race-specific models to predict 10-year risk of heart failure: a multicohort analysis.
        Circulation. 2021; 143: 2370-2383
        • Adler E.D.
        • Voors A.A.
        • Klein L.
        • et al.
        Improving risk prediction in heart failure using machine learning.
        Eur J Heart Fail. 2020; 22: 139-147
        • Segar M.W.
        • Vaduganathan M.
        • Patel K.V.
        • et al.
        Machine learning to predict the risk of incident heart failure hospitalization among patients with diabetes: the WATCH-DM risk score.
        Diabetes Care. 2019; 42: 2298-2306
      1. Design of the Women’s Health Initiative clinical trial and observational study. The Women’s Health Initiative Study Group.
        Control Clin Trials. 1998; 19: 61-109
        • Breiman L.
        • Friedman J.H.
        • Olshen R.A.
        • Stone C.J.
        Classification and Regression Trees.
        Wadsworth Publishing, Belmont, CA1984
        • Hastie T.
        • Tibshirani R.
        • Friedman J.
        The Elements of Statistical Learning.
        Springer Publishing, New York, NY2001
        • Gianfrancesco M.A.
        • Tamang S.
        • Yazdany J.
        • Schmajuk G.
        Potential biases in machine learning algorithms using electronic health record data.
        JAMA Intern Med. 2018; 178: 1544-1547
        • Angraal S.
        • Mortazavi B.J.
        • Gupta A.
        • et al.
        Machine learning prediction of mortality and hospitalization in heart failure with preserved ejection fraction.
        J Am Coll Cardiol HF. 2020; 8: 12-21
        • D’Agostino R.B.S.
        • Vasan R.S.
        • Pencina M.J.
        • et al.
        General cardiovascular risk profile for use in primary care: the Framingham Heart Study.
        Circulation. 2008; 117: 743-753
        • Rosamond W.D.
        • Chang P.P.
        • Baggett C.
        • et al.
        Classification of heart failure in the Atherosclerosis Risk in Communities (ARIC) study: a comparison of diagnostic criteria.
        Circ Heart Fail. 2012; 5 (152-19)
        • Canonico M.
        • Plu-Bureau G.
        • O’Sullivan M.J.
        • et al.
        Age at menopause, reproductive history, and venous thromboembolism risk among postmenopausal women: the Women’s Health Initiative Hormone Therapy clinical trials.
        Menopause. 2014; 21: 214-220
        • Eaton C.B.
        • Pettinger M.
        • Rossouw J.
        • et al.
        Risk factors for incident hospitalized heart failure with preserved versus reduced ejection fraction in a multiracial cohort of postmenopausal women.
        Circ Heart Fail. 2016; 9e002883
        • Hall P.S.
        • Nah G.
        • Howard B.V.
        • et al.
        Reproductive factors and incidence of heart failure hospitalization in the Women’s Health Initiative.
        J Am Coll Cardiol. 2017; 69: 2517-2526
        • Dunlay S.M.
        • Roger V.L.
        • Redfield M.M.
        Epidemiology of heart failure with preserved ejection fraction.
        Nat Rev Cardiol. 2017; 14: 591-602
        • Ho J.E.
        • Enserro D.
        • Brouwers F.P.
        • et al.
        Predicting heart failure with preserved and reduced ejection fraction: the international collaboration on heart failure subtypes.
        Circ Heart Fail. 2016; 9: 10
        • Ness R.B.
        • Harris T.
        • Cobb J.
        • et al.
        Number of pregnancies and the subsequent risk of cardiovascular disease.
        N Engl J Med. 1993; 328: 1528-1533
        • Parker D.R.
        • Lu B.
        • Sands-Lincoln M.
        • et al.
        Risk of cardiovascular disease among postmenopausal women with prior pregnancy loss: the Women’s Health Initiative.
        Ann Fam Med. 2014; 12: 302-309
        • Parikh N.I.
        • Jeppson R.P.
        • Berger J.S.
        • et al.
        Reproductive risk factors and coronary heart disease in the Women’s Health Initiative observational study.
        Circulation. 2016; 133: 2149-2158
        • Maino A.
        • Siegerink B.
        • Algra A.
        • Martinelli I.
        • Peyvandi F.
        • Rosendaal F.R.
        Pregnancy loss and risk of ischaemic stroke and myocardial infarction.
        Br J Haematol. 2016; 174: 302-309
        • Simon T.
        • Beau Yon de Jonage-Canonico M.
        • Oger E.
        • et al.
        Indicators of lifetime endogenous estrogen exposure and risk of venous thromboembolism.
        J Thromb Haemost. 2006; 4: 71-76