Masters Theses
Date of Award
12-1994
Degree Type
Thesis
Degree Name
Master of Science
Major
Computer Science
Major Professor
Michael W. Berry
Committee Members
Brad Vander Zanden, David Straight
Abstract
Lexical-matching methods for information retrieval can be inaccurate when they are used to match a user's queries. Typically, information is retrieved by literally matching terms in documents with those of the query. The problem is that users want to retrieve on the basis of conceptual topic or meaning of a document. There are usually many ways to express a given concept (synonymy), so the literal terms in a user's query may not match those of a relevant document. In addition, most words have multiple meanings (polysemy), so terms in a user's query will literally match terms in irrelevant documents. The implicit high-order structure of associating terms with documents can be exploited by the singular value decomposition (SVD). Latent Semantic Indexing (LSI) is a conceptual indexing technique which uses the SVD to estimate the underlying latent semantic structure of the word to document association. By computing a lower-rank approximation to the original term-document matrix, LSI dampens the effects of word choice variability by representing terms and documents using the (orthogonal) left and right singular vectors.
Current methods for adding new text to an LSI database can have deteriorating effects on the orthogonality of the vectors used to represent terms and documents in high-dimensional subspaces. Updating the SVD so as to preserve the orthogonality among document vectors corresponding to the new term-document matrix is one remedy. Computing the SVD of the new term-document matrix can be avoided by using SVDPACKC routines for appropriate submatrices constructed from existing term and document vectors and similar vectors corresponding to the new text. The cost of the numerical computations needed to update the SVD versus the potential inaccuracy of simply folding-in text presents an interesting tradeoff for LSI database management.
Recommended Citation
O'Brien, Gavin William, "Information management tools for updating an SVD-encoded indexing scheme. " Master's Thesis, University of Tennessee, 1994.
https://trace.tennessee.edu/utk_gradthes/11642