Masters Theses

A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood Composites

Yan ZengFollow

Date of Award

8-2011

Degree Type

Thesis

Degree Name

Master of Science

Major

Statistics

Major Professor

Timothy M. Young

Committee Members

Frank M. Guess, Russell L. Zaretzki

Abstract

Problem: Real-time process and destructive test data were collected from a wood composite manufacturer in the U.S. to develop real-time predictive models of two key strength properties (Modulus of Rupture (MOR) and Internal Bound (IB)) of a wood composite manufacturing process. Sensor malfunction and data “send/retrieval” problems lead to null fields in the company’s data warehouse which resulted in information loss. Many manufacturers attempt to build accurate predictive models excluding entire records with null fields or using summary statistics such as mean or median in place of the null field. However, predictive model errors in validation may be higher in the presence of information loss. In addition, the selection of predictive modeling methods poses another challenge to many wood composite manufacturers.

Approach: This thesis consists of two parts addressing above issues: 1) how to improve data quality using missing data imputation; 2) what predictive modeling method is better in terms of prediction precision (measured by root mean square error or RMSE). The first part summarizes an application of missing data imputation methods in predictive modeling. After variable selection, two missing data imputation methods were selected after comparing six possible methods. Predictive models of imputed data were developed using partial least squares regression (PLSR) and compared with models of non-imputed data using ten-fold cross-validation. Root mean square error of prediction (RMSEP) and normalized RMSEP (NRMSEP) were calculated. The second presents a series of comparisons among four predictive modeling methods using imputed data without variable selection.

Results: The first part concludes that expectation-maximization (EM) algorithm and multiple imputation (MI) using Markov Chain Monte Carlo (MCMC) simulation achieved more precise results. Predictive models based on imputed datasets generated more precise prediction results (average NRMSEP of 5.8% for model of MOR model and 7.2% for model of IB) than models of non-imputed datasets (average NRMSEP of 6.3% for model of MOR and 8.1% for model of IB). The second part finds that Bayesian Additive Regression Tree (BART) produced most precise prediction results (average NRMSEP of 7.7% for MOR model and 8.6% for IB model) than other three models: PLSR, LASSO, and Adaptive LASSO.

Recommended Citation

Zeng, Yan, "A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood Composites. " Master's Thesis, University of Tennessee, 2011.
https://trace.tennessee.edu/utk_gradthes/1041

Download

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Applied Statistics Commons, Statistical Methodology Commons, Statistical Models Commons

COinS

Masters Theses

A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood Composites

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Search

Browse

Contributors

Links

About Trace

Masters Theses

A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood Composites

Author

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Share

Search

Browse

Contributors

Links

About Trace