Doctoral Dissertations

Orcid ID

https://orcid.org/0000-0002-8709-7823

Date of Award

8-2021

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Educational Psychology and Research

Major Professor

Louis M. Rocconi

Committee Members

Louis M. Rocconi, Jennifer A. Morrow, R. Steve McCallum, William Nugent

Abstract

Construct validity is necessary to confirm psychometric instruments measure their intended construct and limit measurement error. Differential item functioning (DIF) helps to identify bias and error within measurement instruments and support construct validity. DIF detection procedures have been thoroughly scrutinized under several item response theory models (e.g., Bulut & Suh, 2017; Cohen et al., 1996; Suh & Cho, 2014; Elosua & Wells, 2013; Wang & Shih, 2010; Woods & Grimm, 2011), but no study has explored DIF detection procedures for data fitted to polytomous multidimensional item response theory models. This study aims to identify the optimal DIF procedures for data that fits the multidimensional graded response model.

Comparisons were made on three statistical and psychometric DIF detection procedures, the multidimensional IRT likelihood ratio (MIRT-LR) test, the multidimensional extension of the logistic discriminant function analysis (MLDFA) method, and the multidimensional multiple causes, multiple indicators interaction (MIMIC-interaction) model. Multidimensional graded response data were generated for twenty items with five response options through a Monte Carlo simulation with varied constraints on sample size, DIF type, percentage of DIF, correlations between latent traits, and latent mean differences between groups to determine the effect of the three DIF detection methods on type I error and rejection rates.

Results indicated type I error rates were inflated (i.e., greater than .05) for all three DIF detection procedures. The MLDFA method produced the highest rejection rates, but also displayed the type I error rates between .06 and .15. The MIRT-LR test indicated poor ability to detect nonuniform DIF and greatly inflated type I error rates when latent mean differences were unbalanced. The MIMIC-interaction model exhibited the lowest type I error rates and adequate rejection rates, indicating strong statistical power under most conditions. General method selection recommendations are provided for psychometric and assessment professionals. Future research for DIF detection procedure for polytomous multidimensional item response theory models is also discussed.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS