Masters Theses
Date of Award
8-1996
Degree Type
Thesis
Degree Name
Master of Science
Major
Computer Science
Major Professor
Michael W. Berry
Committee Members
Brad Vander Zanden, David Straight
Abstract
As the amount of electronic information increases, traditional lexical (or Boolean) information retrieval techniques will become less useful. Large, heterogeneous col-lections will be difficult to search since the sheer volume of unranked documents returned in response to a query will overwhelm the user. Vector-space approaches to information retrieval, on the other hand, allow the user to search for concepts rather than specific words and rank the results of the search according to their relative sim-ilarity to the query. One vector-space approach. Latent Semantic Indexing (LSI), has achieved up to 30% better retrieval performance than lexical searching techniques by employing a reduced-rank model of the term-document space. However, the original implementation of LSI lacked the execution efficiency required to make LSI useful for large data sets. A new implementation of LSI, LSI++, seeks to make LSI efficient, extensible, portable, and maintainable. The LSI++ Application Programming Interface (API) allows applications to immediately use LSI without knowing the implementation details of the underlying system. LSI++ supports both serial and distributed searching of large data sets, providing the same programming interface regardless of the imple-mentation actually executing. In addition, a World-Wide Web interface was created to allow simple, intuitive searching of document collections using LSI++. Timing re-sults indicate the serial implementation of LSI++ searches up to 6 times faster than the original implementation of LSI, while the parallel implementation searches nearly 180 times faster on large document collections.
Recommended Citation
Letsche, Todd A., "Toward large-scale information retrieval using latent semantic indexing. " Master's Thesis, University of Tennessee, 1996.
https://trace.tennessee.edu/utk_gradthes/10882