Masters Theses
Date of Award
8-2004
Degree Type
Thesis
Degree Name
Master of Science
Major
Computer Science
Major Professor
Michael W. Berry
Committee Members
Robert C. Ward, Kwai L. Wong
Abstract
The increasing availability of whole genome sequences in public databases has stimulated the development of new methods to automatically compare and categorize genes and species. Recently developed methods based on the singular value decomposition (SVD) allow for the simultaneous identification and definition of well concerved motifs and gene families using very large whole genome datasets. In contrast, this work discusses the use of a truncated pivoted QR factorization as a scalable alternative to the SVD for comparing whole genomes in a phylogenetic context. This algorithm computes the R factor of the decomposition without forming the Q factor or altering the original matrix. Encodings for both proteins and peptides are obtained by applying the QR factorization to a large and sparse peptide-by-protein data matrix in two ways. Gene and species phylogenies comparable to those from the SVD approach are constructed using cosines as pairwise similarities of the resulting protein vectors. Performance evaluations conducted on a few genomic datasets indicate that such an approach presents an efficient alternative to the SVD-based method for whole genome phylogeny.
Recommended Citation
Pulatova, Shakhina Abdimajidovna, "A Whole Genome Phylogeny Using Truncated Pivoted QR Decomposition. " Master's Thesis, University of Tennessee, 2004.
https://trace.tennessee.edu/utk_gradthes/2319