Date of Award
Doctor of Philosophy
Edmon Begoli, Amir Sadovnik, Anahita Khojandi
Data is often composed of structured and unstructured data. Both forms of data have information that can be exploited by machine learning models to increase their prediction performance on a task. However, integrating the features from both these data forms is a hard, complicated task. This is all the more true for models which operate on time-constraints. Time-constrained models are machine learning models that work on input where time causality has to be maintained such as predicting something in the future based on past data. Most previous work does not have a dedicated pipeline that is generalizable to different tasks and domains, especially under time-constraints. In this work, we present a systematic, domain-agnostic pipeline for integrating features from structured and unstructured data while maintaining time causality for building models. We focus on the healthcare and consumer market domain and perform experiments, preprocess data, and build models to demonstrate the generalizability of the pipeline. More specifically, we focus on the task of identifying patients who are at risk of an imminent ICU admission. We use our pipeline to solve this task and show how augmenting unstructured data with structured data improves model performance. We found that by combining structured and unstructured data we can get a performance improvement of up to 8.5%
Srinivasan, Sudarshan, "Toward More Predictive Models by Leveraging Multimodal Data. " PhD diss., University of Tennessee, 2020.