Date of Award

8-2015

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Michael A. Langston

Committee Members

Jian Huang, Michael A. Langston, Bruce MacLennan, Xiaoyan Zhu

Abstract

Ever-increasing amounts of complex biological data continue to come on line daily. Examples include proteomic, transcriptomic, genomic and metabolomic data generated by a plethora of high-throughput methods. Accordingly, fast and effective data processing techniques are more and more in demand. This issue is addressed in this dissertation through an investigation of various algorithmic alternatives and enhancements to routine and traditional procedures in common use. In the analysis of gene co-expression data, for example, differential measures of entropy and variation are studied as augmentations to mere differential expression. These novel metrics are shown to help elucidate disease-related genes in wide assortments of case/control data. In a more theoretical spirit, limits on the worst-case behavior of density based clustering methods are studied. It is proved, for instance, that the well-known paraclique algorithm, under proper tuning, can be guaranteed never to produce subgraphs with density less than 2/3. Transformational approaches to efficient algorithm design are also considered. Classic graph search problems are mapped to and from well-studied versions of satisfiability and integer linear programming. In so doing, regions of the input space are classified for which such transforms are effective alternatives to direct graph optimizations. In all these efforts, practical implementations are emphasized in order to advance the boundary of effective computation.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS