Repository logo
Log In(current)
  1. Home
  2. Colleges & Schools
  3. Graduate School
  4. Masters Theses
  5. Toward large-scale information retrieval using latent semantic indexing
Details

Toward large-scale information retrieval using latent semantic indexing

Date Issued
August 1, 1996
Author(s)
Letsche, Todd A.
Advisor(s)
Michael W. Berry
Additional Advisor(s)
Brad Vander Zanden
David Straight
Permanent URI
https://trace.tennessee.edu/handle/20.500.14382/32132
Abstract

As the amount of electronic information increases, traditional lexical (or Boolean) information retrieval techniques will become less useful. Large, heterogeneous col-lections will be difficult to search since the sheer volume of unranked documents returned in response to a query will overwhelm the user. Vector-space approaches to information retrieval, on the other hand, allow the user to search for concepts rather than specific words and rank the results of the search according to their relative sim-ilarity to the query. One vector-space approach. Latent Semantic Indexing (LSI), has achieved up to 30% better retrieval performance than lexical searching techniques by employing a reduced-rank model of the term-document space. However, the original implementation of LSI lacked the execution efficiency required to make LSI useful for large data sets. A new implementation of LSI, LSI++, seeks to make LSI efficient, extensible, portable, and maintainable. The LSI++ Application Programming Interface (API) allows applications to immediately use LSI without knowing the implementation details of the underlying system. LSI++ supports both serial and distributed searching of large data sets, providing the same programming interface regardless of the imple-mentation actually executing. In addition, a World-Wide Web interface was created to allow simple, intuitive searching of document collections using LSI++. Timing re-sults indicate the serial implementation of LSI++ searches up to 6 times faster than the original implementation of LSI, while the parallel implementation searches nearly 180 times faster on large document collections.

Degree
Master of Science
Major
Computer Science
File(s)
Thumbnail Image
Name

Thesis96.L48.pdf

Size

7.85 MB

Format

Unknown

Checksum (MD5)

c6303099307f26f45f6dbc57f142eca8

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Contact
  • Libraries at University of Tennessee, Knoxville
Repository logo COAR Notify