Masters Theses

Date of Award

12-1994

Degree Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Major Professor

Michael W. Berry

Committee Members

Brad Vander Zanden, David Straight

Abstract

Lexical-matching methods for information retrieval can be inaccurate when they are used to match a user's queries. Typically, information is retrieved by literally matching terms in documents with those of the query. The problem is that users want to retrieve on the basis of conceptual topic or meaning of a document. There are usually many ways to express a given concept (synonymy), so the literal terms in a user's query may not match those of a relevant document. In addition, most words have multiple meanings (polysemy), so terms in a user's query will literally match terms in irrelevant documents. The implicit high-order structure of associating terms with documents can be exploited by the singular value decomposition (SVD). Latent Semantic Indexing (LSI) is a conceptual indexing technique which uses the SVD to estimate the underlying latent semantic structure of the word to document association. By computing a lower-rank approximation to the original term-document matrix, LSI dampens the effects of word choice variability by representing terms and documents using the (orthogonal) left and right singular vectors.

Current methods for adding new text to an LSI database can have deteriorating effects on the orthogonality of the vectors used to represent terms and documents in high-dimensional subspaces. Updating the SVD so as to preserve the orthogonality among document vectors corresponding to the new term-document matrix is one remedy. Computing the SVD of the new term-document matrix can be avoided by using SVDPACKC routines for appropriate submatrices constructed from existing term and document vectors and similar vectors corresponding to the new text. The cost of the numerical computations needed to update the SVD versus the potential inaccuracy of simply folding-in text presents an interesting tradeoff for LSI database management.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS