Date of Award

5-2005

Degree Type

Thesis

Degree Name

Master of Science

Major

Statistics

Major Professor

Halima Bensmail

Committee Members

Mary Leitnaker, Robert Mee

Abstract

Clustering proteomics data is a challenging problem for any traditional clustering algorithm. Usually, the number of samples is much smaller than the number of protein peaks. The use of a clustering algorithm which does not take into consideration the number of feature of variables (here the number of peaks) is needed. An innovative hierarchical clustering algorithm may be a good approach. This work proposes a new dissimilarity measure for the hierarchical clustering combined with a functional data analysis. This work presents a specific application of functional data analysis (FDA) to a highthrouput proteomics study. The high performance of the proposed algorithm is compared to two popular dissimilarity measures in the clustering of normal and Human T Cell Leukemia Virus Type 1 (HTLV-1)-infected patients samples.

The difficulty in clustering spatial data is that the data is multi - dimensional and massive. Sometimes, an automated clustering algorithm may not be sufficient to cluster this type of data. An iterative clustering algorithm along with the capability of visual steering may be a good approach. This case study proposes a new iterative algorithm which is the combination of automated clustering methods like the bayesian clustering, detection of multivariate outliers, and the visual clustering. Simulated data from a plasma experiment and real astronomical data are used to test the performance of the algorithm.

Recommended Citation

Buddana, Aruna K., "Novel Algorithms and Datamining for Clustering Massive Datasets. " Master's Thesis, University of Tennessee, 2005.
https://trace.tennessee.edu/utk_gradthes/1807

Download

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Statistics and Probability Commons

COinS

Masters Theses

Novel Algorithms and Datamining for Clustering Massive Datasets

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Search

Browse

Contributors

Useful Links

About Trace

Masters Theses

Novel Algorithms and Datamining for Clustering Massive Datasets

Author

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Share

Search

Browse

Contributors

Useful Links

About Trace