Masters Theses
Date of Award
12-1997
Degree Type
Thesis
Degree Name
Master of Science
Major
Computer Science
Major Professor
Michael W. Berry
Committee Members
Bradley Vander Zanden, June Donato
Abstract
Data Mining is the application of algorithms for extracting valuable informa-tion from large databases in order to make important business decisions. This study explores a new technique for data mining - Latent Semantic Indexing (LSI). LSI is an efficient information retrieval method for textual documents. By determining the singular value decomposition (SVD) of a large sparse term-by-document matrix, LSI constructs an approximate vector space model which rep-resents important associative relationships between terms and documents that are not evident in individual documents. This thesis explores the applicability of the LSI model to numerical databases, especially consumer product data. By properly chosing attributes of data records as terms or documents, a term-by-document in-cidence matrix is built and then a distribution-based indexing scheme is employed to construct a correlated distribution matrix. Hence a similar LSI vector space model can be generated to detect useful or hidden patterns in the databases. The extracted information can then be validated using statistical hypotheses testing or resampling. LSI is an automatic yet intelligent indexing method, its application to numerical data introduces a promising way to discover knowledge in important commercial application areas such as retail and consumer banking.
Recommended Citation
Jiang, Jingqian, "Using latent semantic indexing for data mining. " Master's Thesis, University of Tennessee, 1997.
https://trace.tennessee.edu/utk_gradthes/10571