Doctoral Dissertations

Date of Award

8-1998

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Michael W. Berry

Committee Members

Charles Collines, Jack Dongarra, Padma Raghavan

Abstract

Latent Semantic Indexing (LSI) is an SVD-based conceptual retrieval technique which employs a rank-reduced model of the original (sparse) term-by-document matrix. This approach has achieved significant performance improvements over traditional lexical searching methods. With current LSI implementations, however, the ability to overcome polysemy problems (multiple meanings for a word or words) has been lacking. The Riemannian SVD (R-SVD) is a recent nonlinear generalization of the SVD which has been used for applications in systems and control. Updating LSI models based on user feedback can be accomplished using constraints modeled by the R-SVD of a low-rank approximation to the original term-by-document matrix. This dissertation presents the formula tion, implementation and performance analysis of a new LSI model (RSVD-LSI) which is equipped with an effective information filtering mechanism. Two iterative algorithms for computing the related R-SVD are proposed. Experiments have shown that the RSVD-LSI model provides an efficient and robust information retrieval/filtering technique and demonstrates a new approach of updating LSI and similar vector-space models to circumvent polysemy problems and improve retrieval performance. The nonlinear filtering mechanism in RSVD-LSI may also have potential applications in designing certain control and security systems for information retrieval from large collections.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS