Doctoral Dissertations

Orcid ID

https://orcid.org/0000-0003-4308-6302

Date of Award

5-2022

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Data Science and Engineering

Major Professor

Daniel Jacobson

Committee Members

David Kainer, Michela Taufer, Michael Langston

Abstract

As technology improves, the field of biology has increasingly utilized high performance computing techniques to analyze big data and provide insights into biological systems. A reproducible, efficient, and effective method is required to analyze these large datasets of varying types into interpretable results. Iterative Random Forest (iRF) is an explainable supervised learner that makes few assumptions about the relationships between variables and is able to capture complex interactions that are common in biological systems. This forest based learner is the basis of iRF-Leave One Out Prediction (iRF-LOOP), an algorithm that uses a matrix of data to produce all-to-all predictive networks. This dissertation includes a validation of the improved performance of iRF over the industry standard of Random Forest, using synthetic and empirical data from various organisms. Additionally, this dissertation includes the use of iRF to create a predictive model of COVID-19 outcomes using environmental features at the county level in the U.S. This dissertation also includes a whole systems biology study in which an improved iRF-LOOP pre-processing pipeline Divide-Test-Integrate is used to produce new gene-to-gene predictive expression networks for a multiplex network study of the model organism Saccharomyces cerevisiae using seed genes of interest from Septoria musiva.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS