Masters Theses

Author

David Guo

Date of Award

5-2001

Degree Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Major Professor

Michael W. Berry

Committee Members

David Straight, Peiling Wang

Abstract

Latent Semantic Indexing (LSI) is a popular information retrieval model for concept-based searching. As with many vector space IR models, LSI requires an existing term-document association structure such as a term-by-document ma-trix. The term-by-document matrix, constructed during document parsing, can only capture weighted vocabulary occurrence patterns in the documents. How-ever, for many knowledge domains (e.g., medicine) there are pre-existing semantic structures that could be used to organize and to categorize information. The goals of this study are to demonstrate how such semantic structures can be incorporated into the LSI vector space model and to measure their overall effect on query match-ing performance. The new approach, called Knowledge-Enhanced LSI (KELSI), is applied to documents in the OHSUMED medical abstracts using the semantic structures provided by the UMLS Semantic Network and MeSH. Results based on precision-recall graphs and 11-point average precision values (P) indicate that a MeSH-enhanced search index is capable of delivering noticeable incremental performance gain over the original LSI model - 28% improvement for P=.01 and 100% improvement for P=.30. This performance gain is achieved by replacing the original query with the MeSH heading extracted from the query text via regular expression matchs.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS