Repository logo
Log In(current)
  1. Home
  2. Colleges & Schools
  3. Graduate School
  4. Doctoral Dissertations
  5. Applications of Modern Statistical Methods to Analysis of Data in Physical Science
Details

Applications of Modern Statistical Methods to Analysis of Data in Physical Science

Date Issued
May 1, 2006
Author(s)
Wicker, James Eric
Advisor(s)
William E. Blass
Additional Advisor(s)
Halima Bensmail, Hamparsum Bozdogan, Marianne Breinig, Chia C. Shih
Abstract

Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970’s, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies.


The first section of this work showcases applications of modern linear regression. Since the 1960’s, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science.

The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960’s and 1970’s respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.

Disciplines
Physics
Degree
Doctor of Philosophy
Major
Physics
Embargo Date
May 1, 2006
File(s)
Thumbnail Image
Name

WickerJames.pdf

Size

1.32 MB

Format

Adobe PDF

Checksum (MD5)

981395738219570b11e8f345aa822e54

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Contact
  • Libraries at University of Tennessee, Knoxville
Repository logo COAR Notify