Doctoral Dissertations

Date of Award


Degree Type


Degree Name

Doctor of Philosophy


Management Science

Major Professor

Michel Ballings

Committee Members

Anahita Khojandi, Ephy R. Love, Sean Willems


In this doctoral dissertation, I develop two applications of generative machine learning models for healthcare settings. These applications are organized in two chapters. In the first chapter, I analyze the impact of imputation methods on the ability of deep generative models (DGMs) to estimate patient health dynamics in the Intensive Care Unit (ICU). Despite data-rich environments, ICU data often suffer from a considerable amount of missing information. I demonstrate how imputed values can lead to discrepancies between the DGM's loss (difference in the observed and imputed values) and the DGM's failure to accurately reconstruct patient health trajectories. I propose a novel imputation technique, Generative Iterative Multiple Imputation (GIMI), as an on-training imputation strategy with a masked loss function and encoded missingness patterns. I evaluate GIMI's ability to reconstruct patient health trajectories on a dataset of ICU patients receiving heparin treatment. I find that it outperforms benchmarks such as k-nearest neighbors imputation, MICE, and GAIN. I also find that GIMI considerably reduces the reconstruction error of the patient health trajectory. In my second chapter, I explore the impact of historical selection bias on a corpus of 35,000 personal statements collected between 2019 and 2023 from applicants to Graduate Medical Education programs. I examine one outcome of this selection bias, which is a substantial distortion of the ethnic distribution of the US population. Word frequency and topic analyses uncover both differences and similarities among applicants from various ethnic backgrounds, which are manifested in the form of ethnicity-specific topics as well as topics that are shared by all applicants. Furthermore, I find that ethnic diversity is suppressed when all ethnicities share the same topic distribution, particularly affecting Black and Latino applicants. In light of these findings, I propose a topic model that combines both shared and ethnicity-specific topics. I demonstrate that the topics generated by this model preserve ethnic diversity and are more coherent compared to those derived from models using exclusively shared or ethnicity-specific topic distributions.

Available for download on Friday, May 15, 2026

Files over 3MB may be slow to open. For best results, right-click and select "save as..."