Repository logo
Log In(current)
  1. Home
  2. Colleges & Schools
  3. Graduate School
  4. Masters Theses
  5. Information management tools for updating an SVD-encoded indexing scheme
Details

Information management tools for updating an SVD-encoded indexing scheme

Date Issued
December 1, 1994
Author(s)
O'Brien, Gavin William
Advisor(s)
Michael W. Berry
Additional Advisor(s)
Brad Vander Zanden, David Straight
Abstract

Lexical-matching methods for information retrieval can be inaccurate when they are used to match a user's queries. Typically, information is retrieved by literally matching terms in documents with those of the query. The problem is that users want to retrieve on the basis of conceptual topic or meaning of a document. There are usually many ways to express a given concept (synonymy), so the literal terms in a user's query may not match those of a relevant document. In addition, most words have multiple meanings (polysemy), so terms in a user's query will literally match terms in irrelevant documents. The implicit high-order structure of associating terms with documents can be exploited by the singular value decomposition (SVD). Latent Semantic Indexing (LSI) is a conceptual indexing technique which uses the SVD to estimate the underlying latent semantic structure of the word to document association. By computing a lower-rank approximation to the original term-document matrix, LSI dampens the effects of word choice variability by representing terms and documents using the (orthogonal) left and right singular vectors.


Current methods for adding new text to an LSI database can have deteriorating effects on the orthogonality of the vectors used to represent terms and documents in high-dimensional subspaces. Updating the SVD so as to preserve the orthogonality among document vectors corresponding to the new term-document matrix is one remedy. Computing the SVD of the new term-document matrix can be avoided by using SVDPACKC routines for appropriate submatrices constructed from existing term and document vectors and similar vectors corresponding to the new text. The cost of the numerical computations needed to update the SVD versus the potential inaccuracy of simply folding-in text presents an interesting tradeoff for LSI database management.

Degree
Master of Science
Major
Computer Science
File(s)
Thumbnail Image
Name

Thesis94.O27.pdf_AWSAccessKeyId_AKIAYVUS7KB2IXSYB4XB_Signature_iTVtlXGDnnvh_2B2_2FcQ5U7OBDCycA_3D_Expires_1724335766

Size

2.8 MB

Format

Unknown

Checksum (MD5)

75de64c441049acc776573d0a7b365f3

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Contact
  • Libraries at University of Tennessee, Knoxville
Repository logo COAR Notify