Repository logo
Log In(current)
  1. Home
  2. Colleges & Schools
  3. Graduate School
  4. Masters Theses
  5. Increasing the Speed and Efficiency of Search in FBI/CODIS DNA Datadase Through Multivariate Statistical Clustering Approach and Development of a Similarity Ranking Scheme
Details

Increasing the Speed and Efficiency of Search in FBI/CODIS DNA Datadase Through Multivariate Statistical Clustering Approach and Development of a Similarity Ranking Scheme

Date Issued
May 1, 2001
Author(s)
Yadav, Puneet
Advisor(s)
Tsewei Wang
Additional Advisor(s)
J.D. Birdwell, J.R. Collier, Mark Rader
Permanent URI
https://trace.tennessee.edu/handle/20.500.14382/37925
Abstract

A new method has been developed to create and maintain a search tree-structured index to multidimensional data using naturally occurring patterns and clusters within the data, and thereby allows the implementation of efficient search and retrieval strategies in a database. This method was applied to a DNA database, which was developed by the FBI for forensic uses. A set of 10,000 DNA/STR profiles based on the STR allele probability distribution density for the Caucasians has been generated for the sixteen loci. The resulting allele distribution has been analyzed using Multivariate Statistical analysis, in specific; the Principal Component Analysis (PCA) approach was employed to detect clustering patterns among the profiles. The analysis revealed that with the choice of some loci-pairs (such as d13s17 and d16s539) good and distinct clusters were obtainable. Members within each distinct cluster were further studied to determine the attributes that made them distinct from all members of other clusters. The PCA analysis results with a real DNA/STR dataset also showed similar clustering patterns.


In order to rank order the profiles from a search process as to their similarity to that of the target profile, a new Similarity Index (SI) parameter has been developed. The Similarity Index was successfully tested on a small (126) and a large (1026) dataset. Further, a Shuffling Index was developed to study the sensitivity of the Similarity Index to the selection of weights used in the similarity index sub-parameters. Results show that the similarity ranking of profiles remain stable over a wide range of weights.

Disciplines
Chemical Engineering
Degree
Master of Science
Major
Chemical Engineering
Embargo Date
May 1, 2001
File(s)
Thumbnail Image
Name

YadavPuneet.pdf

Size

10.12 MB

Format

Adobe PDF

Checksum (MD5)

1449d57abdf54bfa9c75a8467dc96367

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Contact
  • Libraries at University of Tennessee, Knoxville
Repository logo COAR Notify