Date of Award
Doctor of Philosophy
In this dissertation, we develop novel computationally effiient model subset selection methods for multiple and multivariate linear regression models which are both robust and misspecification resistant. Our approach is to use a three-way hybrid method which employs the information theoretic measure of complexity (ICOMP) computed on robust M-estimators as model subset selection criteria, integrated with genetic algorithms (GA) as the subset model searching engine.
Despite the rich literature on the robust estimation techniques, bridging the theoretical and applied aspects related to robust model subset selection has been somewhat neglected. A few information criteria in the multiple regression literature are robust. However, none of them is model misspecification resistant and none of them could be generalized to the misspecified multivariate regression. In this dissertation, we introduce for the first time both robust and misspecification resistant information complexity (ICOMP) criterion to fill in the gap in the literature.
More specifically in multiple linear regression, we introduce robust M-estimators with misspecification resistant ICOMP and use the new information criterion as the fitness fuction in GA to carry out the model subset selection. For multivariate linear regression, we derive the two-stage robust Mahalanobis distance (RMD) estimator and introduce this RMD estimator in the computation of information criteria. The new information criteria are used as the fitness function in the GA to perform the model subset selection.
Comparative studies on the simulated data for both multiple and multivariate regression show that the robust and misspecification resistant ICOMP outperforms the other robust information criteria and the non-robust ICOMP computed using OLS (or MLE) when the data contain outliers and error terms in the model deviate from a normal distribution. Compared with the all possible model subset selection, GA combined with the robust and misspecification resistant infromation criteria is proved to be an effective method which can quickly find the a near subset, if not the best, without having to search the whole subset model space.
Liu, Yan, "Robust and Misspecification Resistant Model Selection in Regression Models with Information Complexity and Genetic Algorithms. " PhD diss., University of Tennessee, 2007.