Read Our FAQ Security & System Requirements
Our Approach

Assessment Focus

Since 1974 the University of Washington has provided post-secondary institutions with critical quantitative and qualitative information to improve teaching and learning.

Proven Instruments

By licensing IASystem™ your institution has access to the proprietary evaluation forms created by our research and assessment staff. We offer a collection of tested instruments to assess diverse instructional approaches and educational outcomes.

Institutional Norms

IASystem™ provides higher-level analyses of institutional data to maximize the value and validity of evaluation results. Institution-specific reliability estimates define guidelines for decision-making; item-level deciles support cross-class comparisons; and regression analyses allow customized correction for known biases such as expected grade and class size.

Assessment Practitioners

IASystem™ is a service backed by the University of Washington's Institutional Assessment & Evaluation team, formerly Office of Educational Assessment, with expertise in psychometric analyses, item and test development, large scale survey design, and multi-method approaches to evaluation and assessment.

Reliability & Validity

The statistical adage that reliability is a required precursor to validity is as true for course evaluations as for other measurement applications. IASystem™ carries out custom analyses for each institution to confirm item reliability and maximize validity of student ratings of instruction for decision making.

Reliability

Item reliability refers to the stability of ratings across students, courses, or instructors. The higher the reliability of an item, the more confident we are that average ratings reflect student opinions about a class and are not affected by random error, and the smaller the difference between item ratings that is required for statistical significance.

Statistical reliability of course evaluation data is of particular importance when results are used in decision making and the way in which reliability estimates are computed must be consistent with the level of aggregation of the data.

IASystem™ computes two different estimates of item reliability to support different types of decisions. Because reliability estimates are situational, we compute reliabilities separately for each institution and translate the results into decision making “rules of thumb.”

Pedagogical Decision Making

Instructors often make changes in their courses based on ratings of individual classes. For this purpose IASystem™ computes inter-rater correlation coefficients, using class as the unit of analysis. For each item, the reliability coefficient represents the level of agreement among students within the class relative to the mean difference in ratings across classes. Inter-rater correlation coefficients of IASystem™ instructional improvement items typically range from .71 – .82 for class sizes of ten students.

Programmatic Decision Making

High-stakes decisions such as those relating to faculty merit pay increases, promotion, and tenure should be based on aggregated data from multiple classes. For this purpose, IASystem™ computes inter-class correlation coefficients, where instructor is the unit of analysis. For each item, the reliability coefficient represents the level of agreement among all classes for a particular instructor relative to the mean difference across instructors. Inter-class correlation coefficients of IASystem™ “global” items typically range from .69 – .75 for seven combined classes.

Validity

The question “Do student ratings actually reflect instructional quality?” has been extensively studied relative to several different types of validity: construct, convergent, discriminant and consequential. Some of the most useful references are listed below.^[1] Ratings have been found to be influenced to varying degrees by several factors such as expected grade in the course, class size, and reason for enrollment. To maximize the validity of ratings at each institution, IASystem™ identifies institution-specific correlates (possible biases) using regression analyses, and provides both adjusted and unadjusted ratings in standard reports.

^[1] Abrami, P. C, d’Apollonia, S., & Cohen, P. A. (1990). Validity of student ratings of instruction: What we know and what we do not. Journal of Educational Psychology, 82, 219-231.

Aleamoni, L. M. (1999). Student rating myths versus research facts from 1924 to 1998. Journal of Personnel Evaluation in Education, 13(2), 153-166.

Bonitz, V. S. (2011). Student Evaluation of Teaching: Individual Differences and Bias Effects. (Doctoral dissertation). Available from ProQuest Dissertations & Theses database. (UMI No. 3472997)

Kulik, J. A. (2001). Student ratings: Validity, utility, and controversy. In M. Theall, P. C. Abrami, and L. A. Mets (Eds.) The Student Ratings Debate: Are They Valid? How Can We Best Use Them? (pp. 9-26). New Directions for Institutional Research, 109. San Francisco, CA: Jossey-Bass.

Marsh, H. W. (1984). Students’ evaluation of university teaching: Dimensionality, reliability, validity, potential biases, and utility. Journal of Educational Psychology, 76(5), 707-754.

The Solution for
Your Campus

Our Approach

Assessment Focus

Proven Instruments

Institutional Norms

Assessment Practitioners

Reliability & Validity

Reliability

Pedagogical Decision Making

Programmatic Decision Making

Validity

The Fine Print

How to Reach Us

The Solution forYour Campus

Our Approach

Assessment Focus

Proven Instruments

Institutional Norms

Assessment Practitioners

Reliability & Validity

Reliability

Pedagogical Decision Making

Programmatic Decision Making

Validity

The Fine Print

How to Reach Us

The Solution for
Your Campus