Masters Theses

Date of Award

8-2004

Degree Type

Thesis

Degree Name

Master of Science

Major

Electrical Engineering

Major Professor

J. Douglas Birdwell

Committee Members

Tsewei Wang, John Chiasson

Abstract

Generic database similarity search is one of the most challenging problems in current database research. Generic data are not simply structured data with several keys of numeric or alphabetic types. Traditional search algorithms that only check specified fields and keys are not effective. Similarity searches find the objects that are similar to a target using a specified similarity criterion. Tree-structured indexing techniques based on metric spaces are widely used to solve this problem. Existing methods can be divided into two categories: approaches based upon Voronoi partitions and approaches based upon reference points. The later one is the focus of this research. The problem of database similarity search using reference points in metric spaces is formulated, and the key issues are addressed. This research focuses upon two broad sets of open problems: Analysis of the limitations of approaches to similarity search using metric spaces, and development of criteria that can be and to evaluate the opportunities for new design methods.

The performance limitations of similarity search based on metric spaces are analyzed and proved to be imposed by statistical characteristics of the data collection. A new concept, range threshold, is defined to evaluate the feasibility of tree-structured indexing techniques based upon reference points in metric spaces. A method to estimate the range threshold is provided, which makes it possible to check the feasibility of this approach for a data set prior to implementation.

The opportunities for different approaches are evaluated by criteria based on search efficiency and utility. Comparison of different Minkowski metrics and data extraction methods using PCA (principle component analysis) are presented. Search utilities are demonstrated by examples. Several issues related to index tree structure are addressed.

Experimental results show that a taller tree yields better performance. All these results indicate that the approaches based upon reference points in metric spaces are promising.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS