Masters Theses
Date of Award
5-2005
Degree Type
Thesis
Degree Name
Master of Science
Major
Statistics
Major Professor
Halima Bensmail
Committee Members
Mary Leitnaker, Robert Mee
Abstract
Clustering proteomics data is a challenging problem for any traditional clustering algorithm. Usually, the number of samples is much smaller than the number of protein peaks. The use of a clustering algorithm which does not take into consideration the number of feature of variables (here the number of peaks) is needed. An innovative hierarchical clustering algorithm may be a good approach. This work proposes a new dissimilarity measure for the hierarchical clustering combined with a functional data analysis. This work presents a specific application of functional data analysis (FDA) to a highthrouput proteomics study. The high performance of the proposed algorithm is compared to two popular dissimilarity measures in the clustering of normal and Human T Cell Leukemia Virus Type 1 (HTLV-1)-infected patients samples.
The difficulty in clustering spatial data is that the data is multi - dimensional and massive. Sometimes, an automated clustering algorithm may not be sufficient to cluster this type of data. An iterative clustering algorithm along with the capability of visual steering may be a good approach. This case study proposes a new iterative algorithm which is the combination of automated clustering methods like the bayesian clustering, detection of multivariate outliers, and the visual clustering. Simulated data from a plasma experiment and real astronomical data are used to test the performance of the algorithm.
Recommended Citation
Buddana, Aruna K., "Novel Algorithms and Datamining for Clustering Massive Datasets. " Master's Thesis, University of Tennessee, 2005.
https://trace.tennessee.edu/utk_gradthes/1807