Masters Theses
Date of Award
12-1997
Degree Type
Thesis
Degree Name
Master of Science
Major
Computer Science
Major Professor
Michael W. Berry
Committee Members
David Straight, Brad Vander Zanden
Abstract
Due to the growth of large data collections, information retrieval or database searching is of vital importance. Lexical matching techniques may retrieve ir-relevant or inaccurate results because of synonyms and polysemous words, thus, effective concept based techniques are needed. One such technique is Latent Se-mantic Indexing (LSI) which uses a vector-space approach by identifying docu-ments whose content is related to the user's query in order of similarity. LSI uses the singular value decomposition (SVD) of term-by-document matrix to en-code the terms and documents in a vector-space model. The existing methods for removing terms or documents from the term-document space are either time con-suming or do not sufficiently change the term-document relationships. This thesis presents a new method for downdating, Downdating the Reduced Model (or DRM) method, and discusses its implementation into the LSI++ software en-vironment. The DRM method can be used to assess the effect that a term or document has on the clustering of relevant information in a collection and for the incorporation of user feedback in the existing LSI model. Implementing the DRM method within LSI++ not only provides downdating functionality, but is less time consuming than recomputing the SVD when removing a term, document, or both. The DRM method is a viable algorithm for dynamic information modeling and retrieval.
Recommended Citation
Witter, Dian Irene, "Downdating the latent semantic indexing model for information retrieval. " Master's Thesis, University of Tennessee, 1997.
https://trace.tennessee.edu/utk_gradthes/10638