Doctoral Dissertations

Orcid ID


Date of Award


Degree Type


Degree Name

Doctor of Philosophy


Industrial Engineering

Major Professor

Anahita Khojandi

Committee Members

Rama K Vasudevan, John E Kobza, Jim Ostrowski


Reinforcement learning (RL) is a powerful tool for developing personalized treatment regimens from healthcare data. In RL, an agent samples experiences from an environment (such as a model of patient health) to learn a policy that maximizes long-term reward. This dissertation proposes methodological and practical developments in the application of RL to treatment planning problems.

First, we develop a novel time series model for simulating patient health states from observed clinical data. We use a generative neural network architecture that learns a direct mapping between distributions over clinical measurements at adjacent time points. We show that this model produces realistic patient trajectories and can be paired with on-policy RL to learn effective treatment policies.

Second, we develop a novel extension of hidden Markov models, which are commonly used to model and predict patient health states. Specifically, we develop a special case of recurrent neural networks with the same likelihood function as a corresponding discrete-observation hidden Markov model. We demonstrate how combining our model with other predictive neural networks improves disease forecasting and offers novel clinical interpretations compared with a standard hidden Markov model.

Third, we develop a method for selecting high-performing reinforcement learning-based treatment policies for underrepresented patient subpopulations using limited observations. Our method learns a probability distribution over treatment policies from a reference patient group, then adapts its recommendations using limited data from an underrepresented patient group. We show that our method outperforms state-of-the-art benchmarks in selecting effective treatment policies for patients with non-typical clinical characteristics, and predicting these patients' outcomes under its policies.

Finally, we use RL to optimize medication regimens for Parkinson's disease patients using high-frequency wearable sensor data. We build an environment model of how patients' symptoms respond to medication, then use RL to recommend optimal medication types, timing, and dosages for each patient. We show that these patient-specific RL-prescribed medication regimens outperform physician-prescribed regimens and provide clinically defensible treatment strategies. Our framework also enables physicians to identify patients who could could switch to lower-frequency regimens for improved adherence, and to identify patients who may be candidates for advanced therapies.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."