Masters Theses
Date of Award
5-2001
Degree Type
Thesis
Degree Name
Master of Science
Major
Computer Science
Major Professor
Michael W. Berry
Committee Members
David Straight, Peiling Wang
Abstract
Latent Semantic Indexing (LSI) is a popular information retrieval model for concept-based searching. As with many vector space IR models, LSI requires an existing term-document association structure such as a term-by-document ma-trix. The term-by-document matrix, constructed during document parsing, can only capture weighted vocabulary occurrence patterns in the documents. How-ever, for many knowledge domains (e.g., medicine) there are pre-existing semantic structures that could be used to organize and to categorize information. The goals of this study are to demonstrate how such semantic structures can be incorporated into the LSI vector space model and to measure their overall effect on query match-ing performance. The new approach, called Knowledge-Enhanced LSI (KELSI), is applied to documents in the OHSUMED medical abstracts using the semantic structures provided by the UMLS Semantic Network and MeSH. Results based on precision-recall graphs and 11-point average precision values (P) indicate that a MeSH-enhanced search index is capable of delivering noticeable incremental performance gain over the original LSI model - 28% improvement for P=.01 and 100% improvement for P=.30. This performance gain is achieved by replacing the original query with the MeSH heading extracted from the query text via regular expression matchs.
Recommended Citation
Guo, David, "Knowledge-enhanced latent semantic indexing (KELSI): algorithms and applications. " Master's Thesis, University of Tennessee, 2001.
https://trace.tennessee.edu/utk_gradthes/9625