Doctoral Dissertations

Date of Award

5-2021

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Life Sciences

Major Professor

Scott J. Emrich

Committee Members

Barry B. Bruce, Juan Luis Juat-Fuentes, Dave W. Ussery

Abstract

The advent of inexpensive, high-throughput whole genome sequencing (WGS) technologies has led to the generation of thousands of related genomes, even from a single study. Large-scale genome analysis has resulted in hypothesis-generating approaches in the fields of clinical, human and agriculture genomics. Additionally, population-level genomic sampling has resulted in a decrease in false positives in genotype-phenotype associations and an increase in understanding of the basis of disease, antibiotic and pesticide resistance. Deeper understanding of migration, genetic divergence and evolution has also been made possible due to WGS. This research applies comparative genomics, population genomics and data science approaches to whole genome sequence data at the individual gene and genome scale to identify phylogenetic lineages, species representative sets of genomes and potential phenotype associations.

The first chapter is focused on generating representative set of bacteria genomes per a species and determining phylogenetic lineages within each species based on data-driven metrics rather than data-informed manually curated lineages. This goal was accomplished by using Mash for comparing genomes based on whole genome sequence distances, unsupervised clustering metrics for determining lineages and unsupervised learning methods for classification. I first applied these methods to over 15,000 Escherichia coli genomes collected from NCBI RefSeq. Second, I applied these methods to the Cyanobacteria species, Microcystis aeruginosa, collected from NCBI RefSeq. In chapter two, we report the results of surveying the whole genome diversity of the fall armyworm Spodoptera frugiperda (J.E. Smith)) and further the understanding of the populations of fall armyworms within the United States and in Western Hemisphere. This research was accomplished by aligning, calling variants and analyzing genomic diversity based on hierarchical clustering of genomic Mash distances, PCA and comparing nucleotide diversity and population differentiation metrics. In the final chapter of this dissertation I presented a straightforward sequence alignment, variant calling and functional annotation pipeline to identify potential resistant ABCC2 alleles to CryF1. This approach provides a benchmark for future targeted sequencing resistant gene methodologies and the use case for gene surveillance for identifying pesticide resistant alleles.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS