Masters Theses
Date of Award
12-1999
Degree Type
Thesis
Degree Name
Master of Science
Major
Computer Science
Major Professor
Michael W. Berry
Committee Members
Padma Raghavan, Peiling Wang
Abstract
Latent Semantic Indexing (LSI) has been demonstrated to outperform lexical matching in information retrieval. However, the enormous cost associated with the Singular Value Decomposition (SVD) of the large term-by-document matrix becomes a barrier for its application to scalable information retrieval. This thesis shows that information filtering using level search techniques can reduce the SVD computation cost for LSI. For each query, level search extracts a much smaller subset of the original term-by-document matrix with an average of 25% of the original non-zero entries. When LSI is applied to such subsets, the average precision only degrades by 5% due to level search filtering; however, for some document collections an increase in precision has been observed. Level search techniques are enhanced by a pruning scheme that deletes terms connected to only one document from the query-specific submatrix. An average 65% reduction in the number of non-zeros has been observed with a precision loss of 5% for most collections.
Recommended Citation
Zhang, Xiaoyan, "Level search schemes for scalable information retrieval. " Master's Thesis, University of Tennessee, 1999.
https://trace.tennessee.edu/utk_gradthes/10062