Date of Award

8-2010

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Statistics

Major Professor

Hamparsum Bozdogan

Committee Members

Michael W. Berry, Mohammed Mohsin, Russell Zaretzki, Bogdan C. Bichescu

Abstract

In this dissertation, we have developed and combined several statistical techniques in Bayesian factor analysis (BAYFA) and mixture of factor analyzers (MFA) to overcome the shortcoming of these existing methods. Information Criteria are brought into the context of the BAYFA model as a decision rule for choosing the number of factors m along with the Press and Shigemasu method, Gibbs Sampling and Iterated Conditional Modes deterministic optimization. Because of sensitivity of BAYFA on the prior information of the factor pattern structure, the prior factor pattern structure is learned directly from the given sample observations data adaptively using Sparse Root algorithm.

Clustering and dimensionality reduction have long been considered two of the fundamental problems in unsupervised learning or statistical pattern recognition. In this dissertation, we shall introduce a novel statistical learning technique by focusing our attention on MFA from the perspective of a method for model-based density estimation to cluster the high-dimensional data and at the same time carry out factor analysis to reduce the curse of dimensionality simultaneously in an expert data mining system. The typical EM algorithm can get trapped in one of the many local maxima therefore, it is slow to converge and can never converge to global optima, and highly dependent upon initial values. We extend the EM algorithm proposed by \cite{Gahramani1997} for the MFA using intelligent initialization techniques, K-means and regularized Mahalabonis distance and introduce the new Genetic Expectation Algorithm (GEM) into MFA in order to overcome the shortcomings of typical EM algorithm. Another shortcoming of EM algorithm for MFA is assuming the variance of the error vector and the number of factors is the same for each mixture. We propose Two Stage GEM algorithm for MFA to relax this constraint and obtain different numbers of factors for each population. In this dissertation, our approach will integrate statistical modeling procedures based on the information criteria as a fitness function to determine the number of mixture clusters and at the same time to choose the number factors that can be extracted from the data.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS