Doctoral Dissertations
Date of Award
5-2021
Degree Type
Dissertation
Degree Name
Doctor of Philosophy
Major
Life Sciences
Major Professor
Scott J. Emrich
Committee Members
Barry B. Bruce, Juan Luis Juat-Fuentes, Dave W. Ussery
Abstract
The advent of inexpensive, high-throughput whole genome sequencing (WGS) technologies has led to the generation of thousands of related genomes, even from a single study. Large-scale genome analysis has resulted in hypothesis-generating approaches in the fields of clinical, human and agriculture genomics. Additionally, population-level genomic sampling has resulted in a decrease in false positives in genotype-phenotype associations and an increase in understanding of the basis of disease, antibiotic and pesticide resistance. Deeper understanding of migration, genetic divergence and evolution has also been made possible due to WGS. This research applies comparative genomics, population genomics and data science approaches to whole genome sequence data at the individual gene and genome scale to identify phylogenetic lineages, species representative sets of genomes and potential phenotype associations.
The first chapter is focused on generating representative set of bacteria genomes per a species and determining phylogenetic lineages within each species based on data-driven metrics rather than data-informed manually curated lineages. This goal was accomplished by using Mash for comparing genomes based on whole genome sequence distances, unsupervised clustering metrics for determining lineages and unsupervised learning methods for classification. I first applied these methods to over 15,000 Escherichia coli genomes collected from NCBI RefSeq. Second, I applied these methods to the Cyanobacteria species, Microcystis aeruginosa, collected from NCBI RefSeq. In chapter two, we report the results of surveying the whole genome diversity of the fall armyworm Spodoptera frugiperda (J.E. Smith)) and further the understanding of the populations of fall armyworms within the United States and in Western Hemisphere. This research was accomplished by aligning, calling variants and analyzing genomic diversity based on hierarchical clustering of genomic Mash distances, PCA and comparing nucleotide diversity and population differentiation metrics. In the final chapter of this dissertation I presented a straightforward sequence alignment, variant calling and functional annotation pipeline to identify potential resistant ABCC2 alleles to CryF1. This approach provides a benchmark for future targeted sequencing resistant gene methodologies and the use case for gene surveillance for identifying pesticide resistant alleles.
Recommended Citation
Schlum, Katrina A., "Applications of comparative genomics and data science to agricultural and clinical research. " PhD diss., University of Tennessee, 2021.
https://trace.tennessee.edu/utk_graddiss/6714
Included in
Bacteriology Commons, Bioinformatics Commons, Computational Biology Commons, Genomics Commons, Population Biology Commons