Project: Equity in post-HCT Survival Predictions

Project information

  • Category: Machine Learning
  • Project date: Feb 2025
  • Project description: In this notebook, I’ll develop a model to improve the prediction of transplant survival rates for patients undergoing allogeneic Hematopoietic Cell Transplantation (HCT). The goal is to address disparities by bridging diverse data sources, refining algorithms, and reducing biases to ensure equitable outcomes for patients across diverse race groups. The data is downloaded from the Kaggle competition: https://www.kaggle.com/competitions/equity-post-HCT-survival-predictions/data. The dataset consists of 59 variables related to hematopoietic stem cell transplantation (HSCT), encompassing a range of demographic and medical characteristics of both recipients and donors, such as age, sex, ethnicity, disease status, and treatment details. The primary outcome of interest is event-free survival, represented by the variable efs, while the time to event-free survival is captured by the variable efs_time. These two variables together encode the target for a censored time-to-event analysis. The data, which features equal representation across recipient racial categories including White, Asian, African-American, Native American, Pacific Islander, and More than One Race, was synthetically generated using the data generator from synthcity, trained on a large cohort of real CIBMTR data.
  • Python libraries: Pandas, numpy, matplotlib, seaborn, scikit-learn, scikit-survival
  • Project URL: https://github.com/MiltonLSM/post-HCT-survival-predictions