Date of Award
Doctor of Philosophy
Ken Gilbert, Bogdan Bichescu, Mohammed Mohsin
Kernel density estimation is a data smoothing technique that depends heavily on the bandwidth selection. The current literature has focused on optimal selectors for the univariate case that are primarily data driven. Plug-in and cross validation selectors have recently been extended to the general multivariate case.
This dissertation will introduce and develop new and novel techniques for data mining with multivariate kernel density regression using information complexity and the genetic algorithm as a heuristic optimizer to choose the optimal bandwidth and the best predictors in kernel regression models. Simulated and real data will be used to cross validate the optimal bandwidth selectors using information complexity. The genetic algorithm is used in conjunction with information complexity to determine kernel density estimates for variable selection from high dimension multivariate data sets.
Kernel regression is also hybridized with the implicit enumeration algorithm to determine the set of independent variables for the global optimal solution using information criteria as the objective function. The results from the genetic algorithm are compared to the optimal solution from the implicit enumeration algorithm and the known global optimal solution from an explicit enumeration of all possible subset models.
Beal, Dennis Jack, "Data Mining with Multivariate Kernel Regression Using Information Complexity and the Genetic Algorithm. " PhD diss., University of Tennessee, 2009.