Date of Award
Doctor of Philosophy
Kenneth Gilbert, Xiaobing Feng, Chanaka Edirisinghe, Russell Zaretzki
This dissertation develops a novel computationally feasible intelligent data mining and knowledge discovery technique to select the best subset of predictors in multivariate re- gression (MR) models under the assumption that the random error terms of the model follow a general nonnormal family of distributions. Our approach builds an easy-to-use three way hybrid approach by integrating clever statistical modeling procedures based on the information-theoretic measure of complexity (ICOMP) criterion with genetic algorithm (GA) and multivariate nonnormal regression models with Power Exponential (PE) and fam- ily of elliptically contoured (EC) error distributions. This dissertation is composed of four major parts.
First, the information criterion ICOMP is reviewed and the GA’s for model selection and model parameter estimation are developed. The problems of using GA on GA are discussed and a new GA operator, that is, GA Engineering, for improving GA efficiency is introduced.
Second, the model selection problem is studied under the assumption that the random error terms are nonnormal. More specifically, the random error terms follow PE and EC distributions which are generalizations of the Normal distribution by adding a shape para- meter related to the parameter kurtosis. The expression of information complexity criterion ICOMP(IFIM) are derived in closed form. Properties of the maximum likelihood estimates (MLEs) are discussed. Both simulated and real data examples are given to test the performance of the proposed new approach in subset selection of variables using ICOMP(IFIM) and a GA on GA approach.
Third, we extend the PE regression model to two different types of multivariate PE regression models. The first type of multivariate PE regression model assumes that the observations are independent; and the second type of multivariate PE regression model assumes that the observations are dependent. These two types of multivariate PE regression models coincide only when the shape parameter of the multivariate PE distribution is equal to one which corresponds to the multivariate Normal distribution. Method of moments (MOM) and MLEs of the model parameters are developed. Simulated and real data examples are given for both types of multivariate PE regression model selection using ICOMP (IFIM) and GA.
Lastly, model selection problems under multivariate EC error assumptions are investi- gated and the ICOMP(IFIM) expressions for several important EC distributions, including the Pearson Type II, Pearson Type VII, and Kotz’s Type distributions are derived. Simulated and real data examples are used for the demonstration of model selection using ICOMP and GA under different EC assumptions. Multivariate skewed PE regression models, which can handle the skewness and kurtosis simultaneously, and model selection problems are also discussed for future research for intelligent data mining when the data do not follow the usual normal assumption.
Liu, Minhui, "Multivariate Nonnormal Regression Models, Information Complexity, and Genetic Algorithms: A Three Way Hybrid for Intelligent Data Mining. " PhD diss., University of Tennessee, 2006.