Doctoral Dissertations

Variable selection via penalized regression and the genetic algorithm using information complexity, with applications for high-dimensional -omics data

Tyler J. Massaro, University of Tennessee, KnoxvilleFollow

Date of Award

8-2016

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Mathematics

Major Professor

Hamparsum Bozdogan

Committee Members

Vasileios Maroulas, Xiaobeng Feng, Haileab Hilafu, Michael Vose

Abstract

This dissertation is a collection of examples, algorithms, and techniques for researchers interested in selecting influential variables from statistical regression models. Chapters 1, 2, and 3 provide background information that will be used throughout the remaining chapters, on topics including but not limited to information complexity, model selection, covariance estimation, stepwise variable selection, penalized regression, and especially the genetic algorithm (GA) approach to variable subsetting.

In chapter 4, we fully develop the framework for performing GA subset selection in logistic regression models. We present advantages of this approach against stepwise and elastic net regularized regression in selecting variables from a classical set of ICU data. We further compare these results to an entirely new procedure for variable selection developed explicitly for this dissertation, called the post hoc adjustment of measured effects (PHAME). In chapter 5, we reproduce many of the same results from chapter 4 for the first time in a multinomial logistic regression setting. The utility and convenience of the PHAME procedure is demonstrated on a set of cancer genomic data.

Chapter 6 marks a departure from supervised learning problems as we shift our focus to unsupervised problems involving mixture distributions of count data from epidemiologic fields. We start off by reintroducing Minimum Hellinger Distance estimation alongside model selection techniques as a worthy alternative to the EM algorithm for generating mixtures of Poisson distributions. We also create for the first time a GA that derives mixtures of negative binomial distributions.

The work from chapter 6 is incorporated into chapters 7 and 8, where we conclude the dissertation with a novel analysis of mixtures of count data regression models. We provide algorithms based on single and multi-target genetic algorithms which solve the mixture of penalized count data regression models problem, and demonstrate the usefulness of this technique on HIV count data that were used in a previous study published by Gray, Massaro, et al. (2015) as well as on time-to-event data taken from the cancer genomic data sets from earlier.

Recommended Citation

Massaro, Tyler J., "Variable selection via penalized regression and the genetic algorithm using information complexity, with applications for high-dimensional -omics data. " PhD diss., University of Tennessee, 2016.
https://trace.tennessee.edu/utk_graddiss/3944

Download

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Applied Statistics Commons, Biostatistics Commons, Multivariate Analysis Commons

COinS

Doctoral Dissertations

Variable selection via penalized regression and the genetic algorithm using information complexity, with applications for high-dimensional -omics data

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Search

Browse

Contributors

Links

About Trace

Doctoral Dissertations

Variable selection via penalized regression and the genetic algorithm using information complexity, with applications for high-dimensional -omics data

Author

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Share

Search

Browse

Contributors

Links

About Trace