Date of Award
Master of Science
Arnold Saxton, Elissa Chesler, Brynn Voy
The thresholding problem is important in today’s data-rich research scenario. A threshold is a well-defined point in the data distribution beyond which the data is highly likely to have scientific meaning. The selection of threshold is crucial since it heavily influences any downstream analysis and inferences made there from. A legitimate threshold is one that is not arbitrary but scientifically well grounded, data-dependent and best segregates the information-rich and noisy sections of data. Although the thresholding problem is not restricted to any particular field of study, little research has been done. This study investigates the problem in context of network-based analysis of transcriptomic data. Six conceptually diverse algorithms – based on number of maximal cliques, correlations of control spots with genes, top 1% of correlations, spectral graph clustering, Bonferroni correction of p-values and statistical power – are used to threshold the gene correlation matrices of three time-series microarray datasets and tested for stability and validity. Stability or reliability of the first four algorithms towards thresholding is tested upon block bootstrapping of arrays in the datasets and comparing the estimated thresholds against the bootstrap threshold distributions. Validity of thresholding algorithms is tested by comparison of the estimated thresholds against threshold based on biological information. Thresholds based on the modular basis of gene networks are concluded to perform better both in terms of stability as well as validity. Future challenges to research the problem have been identified. Although the study utilizes transcriptomic data for analysis, we assert its applicability to thresholding across various fields.
Borate, Bhavesh Ram, "Comparative Analysis of Thresholding Algorithms for Microarray-derived Gene Correlation Matrices. " Master's Thesis, University of Tennessee, 2008.