Date of Award


Degree Type


Degree Name

Doctor of Philosophy



Major Professor

Dong Xu

Committee Members

Jeff Becker, Loren Hauser, Elizabeth Howell, Ying Xu


As we are moving into the post-genomic era, various high-throughput experimental techniques have been developed to characterize biological systems at the genome scale. The high-throughput data are becoming fundamentally important resources to shed new insights on system-level understanding of the ‘organization’ and ‘dynamics’ of molecules (e.g. genes and proteins), relationships between them, interaction cascades, pathways, modules and various networks (i.e. regulation, co-expression and metabolism). This dissertation focuses on developing computational tools to facilitate the process of translating the ever-growing volumes of high-throughput data into significant biological knowledge on protein functions, pathways and modules.

Although high-throughput data provide a global picture of biological systems about the underlying mechanisms, the details are often noisy. Integration of heterogeneous data that characterize cellular systems from different aspects (i.e. gene expression and protein-protein interactions) can lead to the comprehensive and coherent discoveries of biological insights. We developed a Bayesian probability framework to predict function for unannotated proteins in yeast through integrating protein binary interaction data, protein complex data and microarray gene expression data. We also extended the computational framework to infer biological pathway in an automated and systematical fashion.

Besides bottom-up approaches moving from protein functions to pathways, we also applied top-down approaches to model cellular networks, that is, we started from the architecture of a cellular network to identify functional modules. We applied the k-core algorithm to decompose protein interaction and microarray gene co-expression networks, which provides strong support for modularity principles of networks’ structure and function. Dynamic functional modules and protein complexes have been identified by clustering the network constructed from multiple sources of high-throughput data, shedding insights into understanding the organization and dynamics of a living cell.

We also proposed a consensus approach to model biological pathway by combining different computational tools and integrating multiple sources of high-throughput data. In the future, with the explosion in the quantity and diversity of high-throughput data, it is vital to develop methodologies and innovative tools in bioinformatics to model biological systems and explore biological knowledge in an iterative fashion.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Life Sciences Commons