Masters Theses

Date of Award

6-1979

Degree Type

Thesis

Degree Name

Master of Science

Major

Food Science and Technology

Major Professor

Betty L. Beach

Committee Members

Charles A. Chance, Louis A. Ehrcke, Marjorie P. Penfield

Abstract

A model for establishing content validity and interrater reliability for performance evaluation instruments (Fiedler et al., 1979) was examined for applicability in another situation. A Class Presentation Evaluation Instrument was developed for testing the model. The model steps included: 1. examining the evaluation instrument for content validity; 2. revising the instrument to establish content validity; 3. viewing and evaluating a standardized situation for establishment of interrater reliability; 4. calculating item variance and item rateability; 5. calculating intraclass correlation scores; 6. revising the instrument to establish item variance and intraclass correlation at predetermined levels; 7. implementing the instrument; and 8. reviewing the instrument periodically.

Nine dietetic educators with 56.6 years of experience teaching dietetic students, interns, and trainees were selected for the panel of experts. The panel had a total of 32.1 years teaching with the Coordinated Undergraduate Program in Dietetics at The University of Tennessee, Knoxville.

The Class Presentation Evaluation Instrument was developed after the panel selected a format and distinguished between essential and non-essential evaluation criteria. The format selected was similar to an instrument used currently by the program. A prioritized list of 37 behavior statements and frequency of written comments on past presentation evaluations indicated essential evaluation criteria. The instrument had sixteen evaluation items in nine categories. The categories were: planning and organization, introduction, body of presentation, summary, overall presentation, instructional aids, nonverbal communication, and verbal communication. The first seven categories listed behavior indicators under each and was rated with four graduated narrative descriptors with columns for checking "not applicable" and "not observable" and for writing comments. The last two categories had four and five behavior indicators, respectively, that were rated on a dichotomous scale and had the same columns.

The panel determined content validity by examining each evaluation category and descriptor for clarity, word choice, implied meanings, and consistency with identified competencies. The scale extremes were realistic and attainable by all students. Final content validity was established concurrently when interrater reliability was achieved.

The procedure Fiedler et al. (1979) used for calculating item variance and intraclass correlation, an estimate for interrater reliability, was followed and completed during each trial. Further comparisons of intraclass correlation scores were made by separating rating scales and omitting "not applicable" and "not observable" responses. Statistical Analysis System (Barr et al., 1976) was selected for determining the mean squares.

In three trials using the same video taped standardized situation, interrater reliability was established. Item variance of 0.30 or lower was obtained for 14 of 16 items possible. Intraclass correlation score was 0.44. The fourth trial was to test the stability of the interrater reliability level achieved using a different standardized situation for the panel to view and evaluate. For total instrument, an intraclass correlation score of 0.69 was obtained and 10 of 16 items had variances equal or less than 0.30. The panel had improved intraclass correlation scores with each trial for the evaluation categories using the four-point scale. The dichotomous scale evaluation categories and behavior indicators did not improve with each trial. After evaluating each standardized situation, the panel discussed the items with high variances for revising or clarifying the evaluation instrument and for obtaining agreement among each other for rating student performance.

The model provided a systematic process for establishing content validity and interrater reliability for the Class Presentation Evaluation Instrument. Interrater reliability of the instrument was influenced when more than one rating scale was used and "not applicable" and "not observable" columns were available for checking. The model can serve as an effective training tool in acquainting new CUP faculty with expected student performance levels and performance evaluation instruments. Other disciplines concerned with the evaluation of student performance in clinical experiences may benefit from the use of the model.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Food Science Commons

Share

COinS