Masters Theses

Date of Award

8-2004

Degree Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Major Professor

Michael W. Berry

Committee Members

Robert C. Ward, Kwai L. Wong

Abstract

The increasing availability of whole genome sequences in public databases has stimulated the development of new methods to automatically compare and categorize genes and species. Recently developed methods based on the singular value decomposition (SVD) allow for the simultaneous identification and definition of well concerved motifs and gene families using very large whole genome datasets. In contrast, this work discusses the use of a truncated pivoted QR factorization as a scalable alternative to the SVD for comparing whole genomes in a phylogenetic context. This algorithm computes the R factor of the decomposition without forming the Q factor or altering the original matrix. Encodings for both proteins and peptides are obtained by applying the QR factorization to a large and sparse peptide-by-protein data matrix in two ways. Gene and species phylogenies comparable to those from the SVD approach are constructed using cosines as pairwise similarities of the resulting protein vectors. Performance evaluations conducted on a few genomic datasets indicate that such an approach presents an efficient alternative to the SVD-based method for whole genome phylogeny.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS