Background
Pregnancy complications, including gestational diabetes mellitus (GDM), pre-eclampsia, and preterm birth, are critical predictors of future non-communicable diseases (NCDs). These complications reflect metabolic, vascular, and inflammatory dysfunctions, serving as early indicators of long-term risks such as cardiovascular disease (CVD), chronic kidney disease (CKD), and type 2 diabetes mellitus (T2DM). Pregnancy is thus considered a physiological “stress test,” revealing latent health vulnerabilities years before clinical manifestation (McNestry et al., 2023).
- Gestational Diabetes Mellitus (GDM): GDM increases the risk of T2DM by ninefold and doubles the risk of cardiovascular events. It is also associated with CKD, particularly in Black women (Vounzoulaki et al., 2020; Barrett et al., 2020).
- Pre-eclampsia: Women with pre-eclampsia experience a twofold higher risk of ischemic heart disease, a 3.6-fold increased risk of heart failure, and a 1.7-fold increased risk of stroke within the first decade postpartum (Wu et al., 2017; Ferreira et al., 2020).
- Preterm Birth: Preterm birth is associated with a 63% increase in cardiovascular morbidity and significantly heightened risks for CKD and metabolic syndrome (Grandi et al., 2019).
Despite the robust evidence linking pregnancy complications with long-term NCD risks, existing clinical frameworks lack predictive tools that integrate biomarker data with machine learning (ML) methodologies. The UK Biobank, which enrolled participants from 2006–2010 and linked them to longitudinal Hospital Episode Statistics (HES) data, offers an unparalleled resource for developing these advanced tools (Bycroft et al., 2018).
Aim
To develop a machine learning framework using UK Biobank data to predict the risk of NCDs in women who have experienced pregnancy complications, integrating biomarkers, clinical data, and demographic factors.
Objectives
- Data Integration: Extract and harmonize data on pregnancy complications, biomarkers, and outcomes from the UK Biobank and linked HES records, focusing on GDM, pre-eclampsia, and preterm birth.
- Exploratory Analysis: Investigate patterns of NCDs among women with adverse pregnancy outcomes and explore associations with biomarkers and clinical trajectories.
- Machine Learning Development:
- Develop and validate ML models using biomarkers and clinical features.
- Incorporate explainable AI techniques to ensure usability and transparency.
- Stakeholder Engagement: Conduct participatory workshops with clinicians and patients to evaluate the acceptability and usability of the predictive tools.
Exposures and Outcomes
Exposures
- Pregnancy Complications:
- Gestational Diabetes Mellitus (GDM).
- Pre-eclampsia (including HELLP syndrome).
- Preterm Birth (spontaneous or medically induced).
- Other complications: Hypertensive disorders of pregnancy (HDP), stillbirth, recurrent miscarriage, and small-for-gestational-age (SGA) infants.
- Biomarkers:
- Inflammatory Markers: CRP, interleukins, TNF-α.
- Metabolic Markers: Fasting glucose, HbA1c, lipid profiles, adipokines.
- Placental Dysfunction Markers: sFLT-1/PlGF ratio, oxidative stress markers.
- Renal Function Markers: Serum creatinine, albumin-to-creatinine ratio.
- Clinical and Demographic Variables:
Age, BMI, ethnicity, socioeconomic status, gravidity, parity, smoking, physical activity, and family history of NCDs.
Outcomes
- Primary Outcomes:
- Cardiovascular Diseases (CVD): Coronary artery disease, heart failure, hypertension, stroke, and peripheral arterial disease.
- Type 2 Diabetes Mellitus (T2DM).
- Chronic Kidney Disease (CKD).
- Secondary Outcomes:
- Mental health disorders: Depression, anxiety, and PTSD.
- Other outcomes: Obesity, metabolic syndrome, and cause-specific mortality.
Methodology
Data Source:
The UK Biobank, which enrolled participants from 2006–2010, provides biomarker, clinical, and demographic data for over 500,000 participants, with longitudinal linkage to HES for tracking hospitalizations and diagnoses (Bycroft et al., 2018).
Study Design:
This retrospective cohort study identifies exposures (pregnancy complications) and follows participants longitudinally to assess incident NCDs over up to 15 years.
Population:
- Inclusion Criteria: Parous women with documented pregnancy complications in the UK Biobank.
- Exclusion Criteria: Nulliparous women, incomplete data, or pre-existing chronic conditions such as diabetes or CVD.
Machine Learning Workflow:
- Data Preparation: Handle missing data using imputation techniques and normalize variables.
- Model Development: Train supervised ML models, including gradient boosting (e.g., XGBoost) and neural networks.
- Explainability: Use SHAP (SHapley Additive exPlanations) for feature importance and transparency.
- Validation: Perform 10-fold cross-validation and evaluate models on external datasets.
Analysis Framework:
- Use Cox proportional hazards models to estimate NCD risks.
- Conduct subgroup analyses by ethnicity, socioeconomic status, and age to assess disparities.
Stakeholder Engagement:
Workshops with clinicians and patients will ensure the acceptability, transparency, and clinical relevance of the tools.
Timeline
- Year 1: Data extraction, cleaning, and exploratory analysis.
- Year 2: Model development, validation, and subgroup analyses.
- Year 3: Stakeholder workshops and integration of predictive tools into clinical workflows.
Expected Outcomes
- A validated ML framework for predicting NCD risks in women with pregnancy complications.
- Identification of key biomarkers and predictors for targeted interventions.
- Development of user-centered tools for implementation in clinical practice.
Significance
This project addresses the challenge of linking complex data from disparate healthcare systems to develop actionable insights. By building predictive tools that integrate biomarkers, clinical, and demographic data, this study paves the way for improved risk stratification and early intervention. Participatory design ensures the tools meet clinician and patient needs, bridging the gap between predictive analytics and real-world healthcare applications. Ultimately, this work contributes to improving maternal health outcomes and aligns with broader efforts to advance personalized, equitable care.
References
- McNestry, C. et al. (2023). Pregnancy complications and later life women’s health. Acta Obstet Gynecol Scand, 102, 523–531.
- Vounzoulaki, E. et al. (2020). Progression to type 2 diabetes in women with a known history of gestational diabetes: systematic review and meta-analysis. BMJ, 369, m1361.
- Barrett, P. M. et al. (2020). Adverse Pregnancy Outcomes and Long-term Maternal Kidney Disease. A systematic review and meta-analysis. JAMA Network Open, 3(2), e1920964.
- Wu, P. et al. (2017). Preeclampsia and future cardiovascular health. Circ Cardiovasc Qual Outcomes, 10, e003497.
- Ferreira, R. C. et al. (2020). Pre-eclampsia and later kidney disease. Pregnancy Hypertension, 22, 71–85.
- Grandi, S. M. et al. (2019). Cardiovascular morbidity in pregnancy complications. Circulation, 139, 1069–1079.
- Bycroft, C. et al. (2018). The UK Biobank resource. Nature, 562, 203–209.
Leave a Reply