Masters Theses

Date of Award

12-2004

Degree Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Major Professor

Michael Langston

Committee Members

David Straight, Jian Huang

Abstract

The purpose of this study was to investigate the utility of exact maximal clique enumeration in DNA microarray analysis, to analyze and improve upon existing exact maximal clique enumeration algorithms, and to develop new clique-based algorithms to assist in the analysis as indicated during the course of the study. As a first test, microarray data sets comprised of pre-classified human lung tissue samples were obtained through the Critical Assessment of Microarray Data Analysis (CAMDA) conference. A combination of exact maximal clique enumeration and approximate dominating set was used to attempt to classify the samples.

In another test, maximal clique enumeration was used for a priori clustering of microarray data from Mus musculus (mouse). Cliques from this graph, though smaller than the anticipated groups of co-regulated genes, exhibited a high degree of overlap. Many genes within the overlap are either known or suspected to be involved in one or more gene regulatory networks.

Experimental tests of four exact maximal clique enumeration algorithms on graphs derived from Mus musculus data normalized by either RMA or MAS 5.0 software were performed. A branch and bound Bron and Kerbosch algorithm was shown to perform the best on the widest range of inputs. A base Bron and Kerbosch algorithm was faster on very sparse graphs, but slowed considerably as edge density increased. Both the Kose and greedy algorithms were significantly slower than both Bron and Kerbosch algorithms on all inputs.

Means to improve further the branch and bound Bron and Kerbosch algorithm were then considered. Two preprocessing rules and more exacting bounds were added to the algorithm both together and separately. The low degree preprocessing rule was found to improve performance most consistently, though significant improvement was only observed with the sparsest graphs, where improvement is least necessary.

Finally, a first attempt at developing an algorithm that would integrate genes that were likely excluded from a clique as a result of noise into the appropriate group was made. Initial testing of the resulting paraclique algorithm revealed that the algorithm maintains the desired high level of inter-group edge density while expanding the core clique to a more acceptable size. Research in this area is ongoing.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS