Masters Theses

Date of Award

5-1998

Degree Type

Thesis

Degree Name

Master of Science

Major

Information Sciences

Major Professor

David W. Penniman

Committee Members

Michel W. Berry, Pat L. Fisher

Abstract

As more organizations start relying on users to find information from their own desktops, accumulation of good searching practices decreases and a sense of disparity grows. Major efforts are usually required to build aid tools such as classification schemes and thesauri to organize and expedite retrieval of frequently-requested information. However, vocabulary, domain knowledge, and research interests grow much faster than these aids are updated. A tool for collection, analysis, and exploration of vocabulary used by the users on a day-to-day basis in information retrieval sessions is proposed in this study. It implements a thesaurus with only one type of relationship: Related Terms. Each relationship has an associated weight. As terms are used by the searchers, retrieval effectiveness is analyzed in order to make inferences about associations among the query terms. An algorithm is proposed to quantify this analysis. As a result, relationships are established and weights are updated. Experiments for this research have been done without human user subjects. Three standard test collections have been used to evaluate retrieval effectiveness. User behavior has been modeled using a set of assumptions about satisfaction with the retrieval and resulting perceptions about term usability. Utility of a particular term in a query with other terms was interpreted as a quantifier for association between terms and recorded in the thesaurus. It has been shown that a thesaurus constructed this way can exhibit the following behavior: discover strong and frequently-used associations, maintain strong but infrequently-used associations, and promote newly-established associations. Multiple problems such as high execution complexity for both term retrieval and weight update were observed and ways to ease the problems were suggested. It was shown that this tool can be feasible only for users in a relatively homogeneous environment where similar information is sought and similar queries are executed. Three standard information retrieval testing collections were used in experiments. One that used long descriptive queries on a wide variety of subjects, one that contained descriptive queries in a relatively narrow subject field, and one that used short keyboard queries in a fairly narrow subject domain. Experiment results showed that use of such aid tool can improve retrieval of the 72 different queries in one collection in 493 cases, for the 63 queries in another collection in 480 cases, and for 9 queries in the third collection in 36 cases. Each case is an opportunity to select a term to improve a query by just a single keyword from a list presented by a thesaurus. However, this tool was used without human subjects and without added help of users’ conscious selection from the suggested lists. As a result, there were many cases when suggested terms failed to improve and in a few cases damaged retrieval performance for queries from the same collections.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS