Masters Theses

Date of Award

12-1997

Degree Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Major Professor

Michael W. Berry

Committee Members

David Straight, Brad Vander Zanden

Abstract

Due to the growth of large data collections, information retrieval or database searching is of vital importance. Lexical matching techniques may retrieve ir-relevant or inaccurate results because of synonyms and polysemous words, thus, effective concept based techniques are needed. One such technique is Latent Se-mantic Indexing (LSI) which uses a vector-space approach by identifying docu-ments whose content is related to the user's query in order of similarity. LSI uses the singular value decomposition (SVD) of term-by-document matrix to en-code the terms and documents in a vector-space model. The existing methods for removing terms or documents from the term-document space are either time con-suming or do not sufficiently change the term-document relationships. This thesis presents a new method for downdating, Downdating the Reduced Model (or DRM) method, and discusses its implementation into the LSI++ software en-vironment. The DRM method can be used to assess the effect that a term or document has on the clustering of relevant information in a collection and for the incorporation of user feedback in the existing LSI model. Implementing the DRM method within LSI++ not only provides downdating functionality, but is less time consuming than recomputing the SVD when removing a term, document, or both. The DRM method is a viable algorithm for dynamic information modeling and retrieval.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS