Advertisement
Canadian Journal of Cardiology

Development of a Semiautomated Database for Patients With Adult Congenital Heart Disease

  • Author Footnotes
    ‡ These authors contributed equally to this work.
    Shourya Verma
    Footnotes
    ‡ These authors contributed equally to this work.
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Author Footnotes
    ‡ These authors contributed equally to this work.
    Muhammet Alkan
    Footnotes
    ‡ These authors contributed equally to this work.
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Author Footnotes
    § These contributors are equal senior authors.
    Fani Deligianni
    Footnotes
    § These contributors are equal senior authors.
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Christos Anagnostopoulos
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Gerhard Diller
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Lisa Walker
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Fiona C. Johnston
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Mark Danton
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Hamish Walker
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Lorna Swan
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Amanda Hunter
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Alex McGuire
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Martin Dawes
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Sharon Stott
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Mitchell Lyndsey
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Niki Walker
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Author Footnotes
    § These contributors are equal senior authors.
    Gruschen Veldtman
    Correspondence
    Corresponding author: Dr Gruschen R. Veldtman, SACCS, Golden Jubilee National Hospital, Glasgow G814DY, Scotland, United Kingdom.
    Footnotes
    § These contributors are equal senior authors.
    Affiliations
    Scottish Adult Congenital Heart Disease Service, Royal Brompton Hospital, King’s College London, Golden Jubilee National Hospital and the School of Computing Science at University of Glasgow, Glasgow, Scotland, United Kingdom
    Search for articles by this author
  • Author Footnotes
    ‡ These authors contributed equally to this work.
    § These contributors are equal senior authors.

      Abstract

      Background

      Databases for Congenital Heart Disease (CHD) are effective in delivering accessible datasets ready for statistical inference. Data collection hitherto has, however, been labour and time intensive and has required substantial financial support to ensure sustainability. We propose here creation and piloting of a semiautomated technique for data extraction from clinic letters to populate a clinical database.

      Methods

      PDF formatted clinic letters stored in a local folder, through a series of algorithms, underwent data extraction, preprocessing, and analysis. Specific patient information (diagnoses, diagnostic complexity, interventions, arrhythmia, medications, and demographic data) was processed into text files and structured data tables, used to populate a database. A specific data validation schema was predefined to verify and accommodate the information populating the database. Unsupervised learning in the form of a dimensionality reduction technique was used to project data into 2 dimensions and visualize their intrinsic structure in relation to the diagnosis, medication, intervention, and European Society of Cardiology classification lists of disease complexity. Ninety-three randomly selected letters were reviewed manually for accuracy.

      Results

      There were 1409 consecutive outpatient clinic letters used to populate the Scottish Adult Congenital Cardiac Database. Mean patient age was 35.4 years; 47.6% female; with 698 (49.5%) having moderately complex, 369 (26.1%) greatly complex, and 284 (20.1%) mildly complex lesions. Individual diagnoses were successfully extracted in 96.95%, and demographic data were extracted in 100% of letters. Data extraction, database upload, data analysis and visualization took 571 seconds (9.51 minutes). Manual data extraction in the categories of diagnoses, intervention, and medications yielded accuracy of the computer algorithm in 94%, 93%, and 93%, respectively.

      Conclusions

      Semiautomated data extraction from clinic letters into a database can be achieved successfully with a high degree of accuracy and efficiency.

      Résumé

      Contexte

      Les bases de données sur les cardiopathies congénitales (CC) sont un moyen efficace d’obtenir des ensembles de données pour réaliser des inférences statistiques. Cependant, la collecte de données était jusqu’ici une activité exigeant beaucoup de travail et de temps, et un soutien financier considérable était nécessaire pour en assurer la pérennité. Nous rapportons la création et l’essai pilote d’une technique d’extraction semi-automatique de données provenant de lettres médicales pour alimenter une base de données cliniques.

      Méthodologie

      Des lettres médicales en format PDF stockées dans un dossier local ont été traitées par une série d’algorithmes permettant l’extraction, le prétraitement et l’analyse de données. Des renseignements précis sur les patients (diagnostics, complexité diagnostique, interventions, arythmies, médicaments et données démographiques) ont été transformés en fichiers au format texte et en tableaux de données structurés, afin d’alimenter une base de données. Un schéma précis de validation des données a été prédéfini pour la vérification et le traitement de l’information contenue dans la base de données. Un apprentissage non supervisé, par la technique de réduction de dimensionnalité, a été utilisé pour projeter les données en deux dimensions et visualiser leur structure intrinsèque, en ce qui a trait aux diagnostics, aux médicaments, aux interventions et à la complexité de la maladie (selon la classification de la Société européenne de cardiologie). Quatre-vingt-treize lettres ont été sélectionnées aléatoirement afin de vérifier manuellement l’exactitude de la technique.

      Résultats

      Les lettres médicales consécutives de 1409 patients en contexte de consultation externe ont été utilisées pour alimenter la Scottish Adult Congenital Cardiac Database. L’âge moyen des patients était de 35,4 ans; 47,6 % étaient de sexe féminin; 698 (49,5 %) présentaient des lésions de complexité modérée, 369 (26,1 %) présentaient des lésions de complexité élevée et 284 (20,1 %) présentaient des lésions de complexité légère. L’extraction des diagnostics individuels a été réussie dans 96,95 % des lettres, et les données démographiques ont été extraites dans 100 % d’entre elles. L’extraction des données, le téléversement dans la base de données, l’analyse des données et la visualisation ont été effectués en 571 secondes (9,51 minutes). L’extraction manuelle des données pour les catégories des diagnostics, des interventions et des médicaments a confirmé l’exactitude de l’algorithme informatique dans une proportion de 94 %, de 93 % et de 93 %, respectivement.

      Conclusions

      Une extraction semi-automatique des données de lettres médicales pour alimenter une base de données est possible avec une exactitude et une efficacité élevées.

      Graphical abstract

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Canadian Journal of Cardiology
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Marelli A.J.
        • Ionescu-Ittu R.
        • Mackie A.S.
        • Guo L.
        • Dendukuri N.
        • Kaouache M.
        Lifetime prevalence of congenital heart disease in the general population from 2000 to 2010.
        Circulation. 2014; 130: 749-756
        • Ombelet F.
        • Goossens E.
        • Willems R.
        • et al.
        Creating the BELgian COngenital heart disease database combining administrative and clinical data (BELCODAC): rationale, design and methodology.
        Int J Cardiol. 2020; 316: 72-78
        • Rashid M.
        • Ludman P.F.
        • Mamas M.A.
        British Cardiovascular Intervention Society registry framework: a quality improvement initiative on behalf of the National Institute of Cardiovascular Outcomes Research (NICOR).
        Eur Heart J Qual Care Clin Outcomes. 2019; 5: 292-297
      1. Jain S, Agrawal A, Saporta A, et al. RadGraph: extracting clinical entities and relations from radiology reports. arXiv preprint arXiv:210614463 2021.

        • Pudasaini S.
        • Shakya S.
        • Lamichhane S.
        • Adhikari S.
        • Tamang A.
        • Adhikari S.
        Application of NLP for Information Extraction from Unstructured Documents.
        Springer Singapore, Singapore2022: 695-704
        • Baumgartner H.
        • De Backer J.
        • Babu-Narayan S.V.
        • et al.
        2020 ESC guidelines for the management of adult congenital heart disease: the task force for the management of adult congenital heart disease of the European Society of Cardiology (ESC). Endorsed by Association for European Paediatric and Congenital Cardiology (AEPC), International Society for Adult Congenital Heart Disease (ISACHD).
        Eur Heart J. 2021; 42: 563-645
        • Joint Formulary Committee
        British National Formulary.
        BMJ Publishing and the Royal Pharmaceutical Society, 2021
        • Warnes C.A.
        Congenital heart disease in adults.
        Mayo Clin Proc. 1992; 67: 505
        • Krishnan H.E.
        • Sudheep M.
        • Santhanakrishnan T.
        MongoDB: a comparison with NoSQL databases.
        IJSER. 2016; 7: 1035-1037
        • Pedregosa F.
        • Varoquaux G.
        • Gramfort A.
        • et al.
        Scikit-learn: machine learning in Python.
        JMLR. 2011; 12: 2825-2830
        • Hinton G.E.
        • Roweis S.
        Stochastic neighbor embedding.
        Adv Neural Inform Process Syst. 2002; 15
        • Vander Velde E.
        • Vriend J.
        • et al.
        CONCOR, an initiative towards a national registry and DNA-bank of patients with congenital heart disease in the Netherlands: rationale, design, and first results.
        Eur J Epidemiol. 2005; 20: 549-557
        • Bodell A.
        • Björkhem G.
        • Thilén U.
        • Naumburg E.
        National quality register of congenital heart diseases: can we trust the data?.
        J Congenit Heart Dis. 2017; 1: 1-8
        • Helm P.C.
        • Koerten M.-A.
        • Abdul-Khaliq H.
        • Baumgartner H.
        • Kececioglu D.
        • Bauer U.M.
        Representativeness of the German National Register for Congenital Heart Defects: a clinically oriented analysis.
        Cardiol Young. 2016; 26: 921-926
        • Broberg C.S.
        • Mitchell J.
        • Rehel S.
        • et al.
        Electronic medical record integration with a database for adult congenital heart disease: early experience and progress in automating multicenter data collection.
        Int J Cardiol. 2015; 196: 178-182
        • Diller G.P.
        • Orwat S.
        • Vahle J.
        • et al.
        Prediction of prognosis in patients with tetralogy of Fallot based on deep learning imaging analysis.
        Heart. 2020; 106: 1007-1014
      2. McInnes L, Healy J, Melville J. Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426 2018.

        • Belkina A.C.
        • Ciccolella C.O.
        • Anno R.
        • Halpert R.
        • Spidlen J.
        • Snyder-Cappione J.E.
        Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets.
        Nat Commun. 2019; 10: 1-12
        • Van Der Maaten L.
        • Weinberger K.
        Stochastic triplet embedding. 2012 IEEE International Workshop on Machine Learning for Signal Processing.
        IEEE. 2012; : 1-6