R-FAP: Rapid Functional Annotation of Prokaryotes Using Taxon-specific Pan-genomes and 10-mer Peptides
The growing implementation of next-generation sequencing technologies presents numerous fields with the opportunity to identify bacteria in near real-time. Fields such as counter-terrorism, forensics, medicine, and even microbial ecology are positioned to benefit from such advances and implementation. However, with the ability to rapidly produce high-quality sequence data comes the need to interpret this data as quickly as it is produced. While gene prediction algorithms have kept pace, functional prediction methods have not.
To bypass the need for large-scale queries to multiple databases for each newly-sequenced genome, the project detailed herein seeks to identify the genes shared within a taxonomic group using the pan-genome for that group. Doing so allows the pan-genome to be queried against this set of databases a single time, then rapidly searched with new genomes using k-mer peptide matching to make functional predictions.
Thirty-one strains from Salmonella enterica subsp. enterica were used to build the pan-genome for this taxon as a test model. Proteins in a new genome could then be matched with complete consistence to the resulting database in a matter of seconds (per genome) using a k-mer peptide search algorithm. This represents a major advancement in annotation speed over existing pipelines.
Jordan_Utley__Thesis_APA_2014.docx
667.1 KB
Microsoft Word XML
5ae82e6ca712bf227f12968c05b3f4a8
Jordan_Utley__Thesis_final_version.pdf
807.57 KB
Adobe PDF
173e6a16733c2131a488d4c32b0c2d42