Masters Theses

Date of Award

5-2014

Degree Type

Thesis

Degree Name

Master of Science

Major

Life Sciences

Major Professor

Loren J. Hauser

Committee Members

Elizabeth Fozo, Brian O'Meara, Chongle Pan

Abstract

The growing implementation of next-generation sequencing technologies presents numerous fields with the opportunity to identify bacteria in near real-time. Fields such as counter-terrorism, forensics, medicine, and even microbial ecology are positioned to benefit from such advances and implementation. However, with the ability to rapidly produce high-quality sequence data comes the need to interpret this data as quickly as it is produced. While gene prediction algorithms have kept pace, functional prediction methods have not.

To bypass the need for large-scale queries to multiple databases for each newly-sequenced genome, the project detailed herein seeks to identify the genes shared within a taxonomic group using the pan-genome for that group. Doing so allows the pan-genome to be queried against this set of databases a single time, then rapidly searched with new genomes using k-mer peptide matching to make functional predictions.

Thirty-one strains from Salmonella enterica subsp. enterica were used to build the pan-genome for this taxon as a test model. Proteins in a new genome could then be matched with complete consistence to the resulting database in a matter of seconds (per genome) using a k-mer peptide search algorithm. This represents a major advancement in annotation speed over existing pipelines.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS