Os IGNORANTES, que acham saber tudo, privam -se de um dos maiores prazeres da vida: APRENDER.

Student Ratings of Teaching

 

Wilbert J. McKeachie
University of Michigan
 
In 1946, when I began teaching at the University of Michigan, the faculty had already voted that student ratings of teaching should be collected in all classes, and I used student ratings along with data on achievement in my first research study on effective teaching in the winter of 1947.
In 1949 Dean Heyward Keniston gave me a graduate assistant to collect data and review research on student ratings. We found that student ratings were collected at a number of universities and colleges, including Harvard, the University of Washington, and Purdue.
In 1951 the program for evaluating teaching came up for review. I remember the heated debate in the College of Literature, Science and the Arts about the recommendation of the College Executive Committee that the college continue to require collection of student ratings in all courses. Some faculty members felt strongly about the impropriety of students presuming to express opinions about a professor's teaching. Encouraging students to think that they were qualified to make judgments about teaching would destroy the proper respect students should have for the faculty. Others asserted with great fervor that teaching is an art; it is impossible to evaluate in terms of some form of measurement.
The result of the debate was the adoption of an open-ended form to be administered in all courses. There were five questions:
  1. What do you think are the objectives of the course?
  2. What is your judgment of the value of this course in your education? Please point out both its contributions and its deficiencies.
  3. To what extent did you learn to think critically in the area covered by this course?
  4. Assuming you have the time and opportunity either in college or later, do you think you would be inclined to pursue interests in this area? Explain.
  5. Keeping in mind that the returns from this questionnaire will be used by the instructor in improving his teaching, please mention any other aspects of the course or instructor (such as, for example, clarity of presentation) not covered in previous questions which you consider to be especially good or poor, and offer any suggestions you have for the improvement of the course.
As I look at this almost half a century later, I'm impressed. I don't think we do any better today.
During the period of student activism in the 1960s and 1970s, there was a great increase in the use of student ratings. As Pat Hutchings indicates, they are now used in most colleges and universities.
Nonetheless student ratings are still controversial. Most of us are sensitive about being evaluated, and anytime the results are negative it is natural to question the validity of the evaluation. And there are some negative evaluations in almost every class. That's not surprising; teaching that is effective for some students is not equally effective for everyone.
Some years ago I was a member of a committee administering grants to senior faculty members who proposed to construct or modify their courses to emphasize thinking. At an end-of-the-year dinner for the participants, the discussion turned to student ratings, and the usual criticisms were raised.
"Students don't really appreciate a good course until they are out of college."
"Students can't really judge how well they are learning."
"Students only give high ratings to courses with low standards."
It happened that Herb Marsh, a professor at the University of Western Sydney, was visiting me at the time, and I had invited him to be my guest at the dinner. He is probably the world's leading researcher on student ratings of teaching, and as a guest he kept quiet as long as he could. But finally he could stand it no longer and said, "You know, there's a good deal of research evidence on the issues you've raised."
A prominent historian immediately retorted, "We don't care about research evidence; we have our own experience."
So much for teaching critical thinking!
In any case, I have heard Kenneth Feldman (the preeminent reviewer of research on student ratings) say that there have been over 2,000 articles published on student ratings--well over 1,000 of which present research evidence. In fact, we probably have more good research on student ratings than on any other aspect of higher education.
There are three major uses of student ratings:
  1. Student guidance in choice of courses.
  2. Improvement of teaching.
  3. Evaluating teaching for use in personnel decisions; e.g., tenure or merit salary increases.
How well do student ratings achieve these purposes?
1. Student Guidance
This is first by right of seniority. Student ratings were first collected, I believe, at Harvard University and published to provide guidance for students in choosing courses. Faculty members usually presume that students are thus likely to choose the easiest courses, but in a study we ran several years ago, we found that, as compared with an uninformed control group, students given student ratings of two alternative introductory courses chose the more highly rated course, even though it was rated as requiring more work (Coleman and McKeachie).
2. Improvement of Teaching
Harvard was not alone in using student ratings. In the mid-1920s Herman Remmers of Purdue University began a program of research on student ratings that made substantial contributions for over four decades. His studies are still among the best that have ever been done.
Remmers and his students found:
  1. In multi-section courses, the teachers of those sections achieving higher scores on classroom examinations are rated higher than those teachers whose students have not learned as much. Moreover, if a teacher aims a course at the top students, those students give higher ratings than the rest of the class. However, if a teacher is particularly effective with the poorer students those students rate the teacher higher (Elliott).
  2. Ratings of teachers by alumni 10 years after graduation correlate well with ratings of the same teachers at the end of a course (Drucker and Remmers).
  3. Student characteristics such as age, sex, class standing, and grade in the course have little effect on ratings of teaching (Remmers and Brandenburg).
The research of Remmers and those who have followed him also strongly indicates that:
  1. Student ratings returned to faculty members result in some improvement of teaching, but not very much.
  2. There is more improvement if behavioral items are used rather than more abstract, general terms. For example, instead of asking about clarity, ask "Uses concrete examples" or "Fails to define new terms" (negative), or instead of an item on organization, use "Reviews topics from previous lecture" or "Puts outline on the blackboard" (Murray).
  3. There is substantial improvement when the ratings are discussed with another teacher (McKeachie et al. 1980).
3. Personnel Decisions
If student ratings are part of the data used in personnel decisions, one must have convincing evidence that they add valid evidence of teaching effectiveness. I have already reviewed Remmers's extensive validity studies. They have been replicated at other universities. In general, better teachers (as measured by student learning) are rated higher by students. In addition, there is evidence that students of faculty members who are rated highly are more likely to be motivated to further learning as indicated by election of advanced courses in the same field. Highly rated teachers also produce more change in attitude sophistication (McKeachie, Lin, and Mann). The instructor's own judgment also correlates well with student ratings. Marsh found that if you asked instructors which of two classes went better, their judgments agreed well with the student ratings of the classes.
Finally, for this assembly, perhaps the most interesting evidence of validity is that humanities teachers are rated as being more effective than teachers of science, math, and engineering (Feldman). There is also fairly persuasive evidence that humanities teachers are actually better teachers. Humanities teachers:
  1. Are more expressive--move around more, use more gestures.
  2. Know students' names, encourage questions, ask questions.
  3. Show an interest in student ideas, show concern for student progress.
  4. Ask more questions requiring analysis and synthesis on exams; science and technology teachers ask more rote memory questions (Murray & Renaud).
  5. These are characteristics that lead to longer-term retention and greater gains in thinking and motivation.
But aren't there biases or contextual factors that can invalidate student ratings? Probably the most common criticism by uninformed faculty members is that you get good ratings by "dumbing down" your course. Cutting down the amount of work will, they feel, inevitably result in higher student ratings. The facts, however, indicate that this is not generally true. Student ratings of teaching are higher for courses that are rated as requiring more work or that are more difficult. Undoubtedly, there is a limit. If a course is pitched above the students' heads, or if the course requires more work that most students can do, so that less learning results, student ratings will be lower than for courses that result in better learning.
Generally, small classes are rated higher than large classes, but research shows that small classes are more effective than large classes in producing changes in thinking, motivation, and attitudes (McKeachie). Similarly, there are often small differences between required and elective classes and lower-level vs. higher-level classes.
But the great concern about bias is based on the idea that we should be able to compare teachers to one another, that the number 3.1 should signify better teaching than 3.0, that we should be able to compare two teachers teaching in different departments at different class levels with different students. I argue that this is neither necessary nor desirable. In fact, for promotion and salary decisions we do not need to make such comparisons. For the decision about promotion we really only need two categories--good enough to promote or not promotable. Even for salary increases we need only to determine whether the teacher is excellent, good, adequate, or in need of help. We can determine these categories simply by looking at the distribution of the student responses. What proportion of the students give favorable ratings?
We don't need to figure averages to a decimal point. Comparing teachers with averages such as 4.3 and 4.1 is like comparing apples to oranges. We can tell a good apple or a good orange, but judging whether a good apple is better than a good orange is a much more difficult task.
Conclusion
It is clear that student ratings have the potential to contribute positively both to improvement of teaching and to the quality of personnel decisions about teaching. The problem is not in the ratings but in their use.
Students. Student time is used to fill out the ratings, but the students get little benefit from the time they invest. They are not encouraged to think about their own learning and their own responsibility for learning. Answering the questions should be an educational experience, not a mindless appraisal of the teacher.
Forms. The forms used in many colleges and universities are not as useful as they could be. Often a college or department requires that a set of standard items be used. Typically such items are not as applicable to the specific course as would be the case if the teacher developed or chose items specifically about aspects of the course. Moreover, the very fact that the items are mandated is likely to lead to resentment and resistance.
Norms. In order to conclude that a teacher is reasonably good, what percentage of his or her students would you expect to rate the teacher as excellent? Ten percent? Twenty percent? Fifty percent? Certainly, if at least half of your students think you are excellent, you can't be too bad. At the University of Michigan, over 90 percent of the faculty are rated as excellent by the majority of their students; yet when the faculty members look at their results, almost half of those rated as excellent by the majority of their students find that they are below average. This is discouraging and is more likely to result in a loss of motivation than in increased enthusiasm for teaching.
Evaluators. Whatever the source of data--student ratings, peer evaluation, gossip--some committee or administrator has to make an evaluative judgment. Students are not the evaluators; they simply provide data to the evaluators. In most universities the initial evaluation is made by peers--faculty members elected or appointed to a committee that reviews the evidence for promotion or merit increase in salary.
A key element is the good sense of the evaluators. Unfortunately, many evaluators have stereotypes about what constitutes good teaching, despite the fact that there are many ways to be effective. Thus, they may undervalue a teacher because the students' judgments of their own learning may not fit with the pattern of ratings on such characteristics as organization or enthusiasm, or other characteristics usually associated with effective teaching.
Often the evaluators give less weight to the student ratings than to less dependable evidence, such as peer observations of teaching, testimonials, or general impressions of the teacher's personality.
In an effort to be objective, the evaluators may substitute arbitrary criteria for reasoned judgment. Thus, they may set as a criterion for promotion such that the teacher must exceed a certain numerical mean on student ratings, without consideration of what the teacher is trying to accomplish, the circumstances under which the teacher has to work, the kind of course being taught, the nature of the students, and the many contextual factors that should temper their judgment.
References
Coleman, J., and W.J. McKeachie. "Effects of Instructor/Course Evaluations on Student Course Selection." Journal of Educational Psychology 73 (1981): 224-26.
Drucker, A.J., and H.H. Remmers. "Do Alumni and Students Differ in Their Attitudes Toward Instructors?" Journal of Educational Psychology 42 (1951): 129-43.
Elliott, D.N. "Characteristics and Relationships of Various Criteria of Teachings." Diss. Purdue University, 1949.
Feldman, K.A. "Course Characteristics and College Students' Ratings of Their Teachers: What We Know and What We Don't." Research in Higher Education 9 (1978): 199-242.
Marsh, H.W. Students' Evaluations of University Teachings: Research Findings, Methodological Issues, and Directions for Further Research. Elmsford, NY: Pergamon, 1987.
McKeachie, W.J. Teaching-Tips: Strategies, Research, and Theory for College and University Teachers. 9th ed. Lexington, MA: D.C. Heath, 1994.
McKeachie, W.J., Y-G Lin, M. Daugherty, M.M. Moffett, C. Neigler, J. Nork, M. Walz, and R. Baldwin. "Using Student Ratings and Consultation to Improve Instruction." British Journal of Educational Psychology 50 (1980): 168-74.
McKeachie, W.J., Y-G Lin, and W. Mann. "Student Ratings of Teaching Effectiveness: Validity Studies." American Educational Research Journal 8 (1971): 435-45.
Murray, H.G. "Low-Inference Classroom Teaching Behaviors and Student Ratings of Teaching Effectiveness." Journal of Educational Psychology 75 (1983): 138-49.
Murray, H.G., and R.D. Renaud. "Disciplinary Differences in Classroom Teaching Behaviors." Disciplinary Differences in Teaching and Learning: Implications for Practice. New Directions in Teaching and Learning. No. 64. Ed. N. Hativa and M. Marincovich. San Francisco: Jossey-Bass, 1995.
Remmers, H.H., and G.C. Brandenburg. "Experimental Data on the Purdue Rating Scale for Instructors." Educational Administration and Supervision 13 (1927): 519-27.