Masters Theses

Date of Award

5-2001

Degree Type

Thesis

Degree Name

Master of Science

Major

Chemical Engineering

Major Professor

Tsewei Wang

Committee Members

J.D. Birdwell, J.R. Collier, Mark Rader

Abstract

A new method has been developed to create and maintain a search tree-structured index to multidimensional data using naturally occurring patterns and clusters within the data, and thereby allows the implementation of efficient search and retrieval strategies in a database. This method was applied to a DNA database, which was developed by the FBI for forensic uses. A set of 10,000 DNA/STR profiles based on the STR allele probability distribution density for the Caucasians has been generated for the sixteen loci. The resulting allele distribution has been analyzed using Multivariate Statistical analysis, in specific; the Principal Component Analysis (PCA) approach was employed to detect clustering patterns among the profiles. The analysis revealed that with the choice of some loci-pairs (such as d13s17 and d16s539) good and distinct clusters were obtainable. Members within each distinct cluster were further studied to determine the attributes that made them distinct from all members of other clusters. The PCA analysis results with a real DNA/STR dataset also showed similar clustering patterns.

In order to rank order the profiles from a search process as to their similarity to that of the target profile, a new Similarity Index (SI) parameter has been developed. The Similarity Index was successfully tested on a small (126) and a large (1026) dataset. Further, a Shuffling Index was developed to study the sensitivity of the Similarity Index to the selection of weights used in the similarity index sub-parameters. Results show that the similarity ranking of profiles remain stable over a wide range of weights.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS