Date of Award

12-2008

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Michael A. Langston

Committee Members

Michael W. Berry, Elissa J. Chesler, Lynne E. Parker

Abstract

Graph theoretical approaches have been widely used to solve problems arising in bioinformatics and genomic analysis. In particular, enumeration problems such as maximal clique and maximal biclique finding are cores for addressing many biological problems, such as the integration of genome mapping data. However, the enumeration problems may become computation and memory bot- tlenecks for genome-scale elucidation of biological networks due to their NP-hard nature and the huge memory requirements.

Therefore, this research is interested in developing exact, scalable, and efficient algorithms for these biological problems. The Clique Enumerator including a maximal clique enumeration algo- rithm and its parallel implementation is developed to provide scalable and exact solutions to the maximal clique problem by taking advantage of (a) ultra-large globally addressable memory archi- tectures, (b) the theory of fixed parameter tractability, and (c) a novel bitmap data representation scheme for memory and search space reduction. An implementation of the parallel algorithm scales linearly to at least 64 processors.

Maximal biclique finding on bipartite graphs is the other focus of this research because of its key role in the study of gene-phenotype association graphs built from heterogeneous data types. A general purpose algorithm is developed for enumerating maximal bicliques in a bipartite graph, without any problem restriction. Experimental results using both gene expression data and syn- thetic data indicate that the new algorithm can be two to three orders of magnitude faster than the best previous alternatives.

The clique and biclique enumeration algorithms are applied to two applications in genetics: the linkage disequilibrium analysis in population genetics with clique algorithms, and the ontology discovery for systems genetics with biclique algorithms. The clique model is applied to extract linkage disequilibrium networks from all loci on the whole-genome. The structures of such networks are analyzed to compare the genetic structure of populations. The biclique model is applied to enumerate bicliques in gene-phenotype association graphs constructed from empirical gene sets. Bicliques are integrated into a directed acyclic graph to provide an empirical discovery of phenotype ontology.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS