Doctoral Dissertations

Orcid ID

0000-0001-5249-1946

Date of Award

8-2022

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Energy Science and Engineering

Major Professor

Sudarsanam Babu

Committee Members

Daniel Jacobson, Helen A. Baghdoyan, Ralph Lydic, Michael A. Langston, James Fordyce

Abstract

With the continuous improvements in biological data collection, new techniques are needed to better understand the complex relationships in genomic and other biological data sets. Explainable Artificial Intelligence (X-AI) techniques like Iterative Random Forest (iRF) excel at finding interactions within data, such as genomic epistasis. Here, the introduction of new methods to mine for these complex interactions is shown in a variety of scenarios. The application of iRF as a method for Genomic Wide Epistasis Studies shows that the method is robust in finding interacting sets of features in synthetic data, without requiring the exponentially increasing computation time of many classic association study methods. Leveraging the non-parametric prediction capabilities of iRF, new genomic insights are used to improve Genomic Selection and Progeny Prediction. This capability enables breeders to make informed selections for crosses without first requiring a full progeny trial. A new algorithm, Tensor Iterative Random Forest (TiRF), expands upon the foundation of iRF, to provide information on relationships not only between the features and targets of the model, but also the between the targets themselves. This algorithm is validated with the capture of information from gene regulatory networks from the DREAM competition. The impact of the SARS-CoV-2 virus has necessitated a method that can capture the changing nature of the genetic architecture of the virus and incorporate potential recombination events, paving the way for a better understanding of how the virus has changed and will change. A new method is introduced that identifies likely parents of haplotypes designated to be the result of recombination. Together, these new methods aim to provide a stronger insight into genetic architecture complexities.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS