Date of Award

8-2004

Degree Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Major Professor

Michael W. Berry

Committee Members

Samuel Jordan, Robert Ward

Abstract

This study presents a methodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection. The methodology involves encoding the text data using a low rank nonnegative matrix factorization algorithm to retain natural data nonnegativity, thereby eliminating the need to use subtractive basis vector and encoding calculations present in other techniques such as principal component analysis for semantic feature abstraction. Existing techniques for nonnegative matrix factorization are reviewed and a new hybrid technique for nonnegative matrix factorization is proposed. Performance evaluations of the proposed method is conducted on a few benchmark text collections used in standard topic detection studies.

Recommended Citation

Shahnaz, Farial, "A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining. " Master's Thesis, University of Tennessee, 2004.
https://trace.tennessee.edu/utk_gradthes/4795

Download

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Computer Sciences Commons

COinS

Masters Theses

A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Search

Browse

Contributors

Links

About Trace

Masters Theses

A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining

Author

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Share

Search

Browse

Contributors

Links

About Trace