Masters Theses

Author

Xiaoyan Zhang

Date of Award

12-1999

Degree Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Major Professor

Michael W. Berry

Committee Members

Padma Raghavan, Peiling Wang

Abstract

Latent Semantic Indexing (LSI) has been demonstrated to outperform lexical matching in information retrieval. However, the enormous cost associated with the Singular Value Decomposition (SVD) of the large term-by-document matrix becomes a barrier for its application to scalable information retrieval. This thesis shows that information filtering using level search techniques can reduce the SVD computation cost for LSI. For each query, level search extracts a much smaller subset of the original term-by-document matrix with an average of 25% of the original non-zero entries. When LSI is applied to such subsets, the average precision only degrades by 5% due to level search filtering; however, for some document collections an increase in precision has been observed. Level search techniques are enhanced by a pruning scheme that deletes terms connected to only one document from the query-specific submatrix. An average 65% reduction in the number of non-zeros has been observed with a precision loss of 5% for most collections.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS