Date of Award

8-2011

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Michael A. Langston

Committee Members

Michael W. Berry, Jian Huang, Arnold Saxton, Brynn H. Voy

Abstract

Biological data derived from high-throughput microarrays can be transformed into finite, simple, undirected graphs and analyzed using tools first introduced by the Langston Lab at the University of Tennessee. Transforming raw data can be broken down into three main tasks: data normalization, generation of similarity metrics, and threshold selection. The choice of methods used in each of these steps effect the final outcome of the graph, with respect to size, density, and structure. A number of different algorithms are examined and analyzed to illustrate the magnitude of the effects.

Graph-based tools are then used to extract putative gene networks. These tools are loosely based on the concept of clique, which generates clusters optimized for density. Innovative additions to the paraclique algorithm, developed at the Langston Lab, are introduced to generate results that have highest average correlation or highest density. A new suite of algorithms is then presented that exploits the use of a priori gene interactions. Aptly named the anchored analysis toolkit, these algorithms use known interactions as anchor points for generating subgraphs, which are then analyzed for their graph structure. This results in clusters that might have otherwise been lost in noise.

A main product of this thesis is a novel collection of algorithms to generate exact solutions to the maximum clique problem for graphs that are too large to fit within core memory. No other algorithms are currently known that produce exact solutions to this problem for extremely large graphs. A combination of in-core and out-of-core techniques is used in conjunction with a distributed-memory programming model. These algorithms take into consideration such pitfalls as external disk I/O and hardware failure and recovery.

Finally, a web-based tool is described that provides researchers access the aforementioned algorithms. The Graph Algorithms Pipeline for Pathway Analysis tool, GrAPPA, was previously developed by the Langston Lab and provides the software needed to take raw microarray data as input and preprocess, analyze, and post-process it in a single package. GrAPPA also provides access to high-performance computing resources, via the TeraGrid.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS