Can We Fix Professor Evaluations?


This blog is going to be meaty. I know it’s 8 days before Christmas and for the 1% of people combing through edu-blogs this will not be a fun, year in review, prediction, or otherwise mental-gymnastics kind of blog. Instead, I am going to talk through a jumble of problematic, controversial, in-need-of-systems-thinking issues for all of higher education. From confirmation bias to end-of-course evaluation problems to politics on campus to a serious lack of understanding regarding learning, this blog will cover a lot. Buckle up. (And come back next week for my ‘Twas the Night Before Christmas remix…yes, seriously.)

I was doing my daily intake of higher education reading when I got to Nancy Bunge’s article published a few weeks ago titled, “Students Evaluating Teachers Doesn’t Just Hurt Teachers. It Hurts Students.“

As we head into 2019, it frustrates and annoys me that we still haven’t figured out how to evaluate instructors, instruction, learning, and more. I don’t think it is controversial to point out that, on the whole, higher education stinks at course and instructor evaluation.

I’ve blogged about this on and off every year for a decade or more. I hearken back to Fall of my Senior year in college. We had a professor that was profoundly bad, in every sense of the word. Her grading appeared quite subjective, she could not relate to students in meaningful ways, she was so scattered that nobody was sure what was due, when it was due, or what was expected, and on and on. She was one of the worst professors I have ever had in my K-20 experience.

Near the end of the class, she left the room as the Department’s grad assistant administered the end of course surveys. But the class of communication majors did something I had never seen done before. (Neither had the grad student as she allowed it.) We all huddled together and formed a web of argumentation, collective anecdotes, and frustration. We gave pertinent examples of bad instruction. We gave relevant examples of inconsistent and poor grading. After all was said and done, 27 out of 27 students finished their reviews by pleading with the administration to never let this professor back in front of students.

The following Spring, guess what happened? Nothing. That same professor taught all of the same classes. In fact, as I attended that same University for grad school, I can report that she taught the same classes throughout. As a Grad Assistant in the department, I know from dozens of personal conversations with her students over those two years that nothing changed. Yet she continued teaching, even being promoted before I left.

The Bunge article, however, paints a very different picture. She offhandedly remarks that, “many institutions of higher learning use these surveys to determine whether faculty keep their jobs or get raises.“ Is that true? It certainly did not feel true as a student. Nor was it my experience as an administrator at 3 different institutions. In fact, as the LMS administrator at one institution, I saw instructor surveys dating back 7 years which showed that one professor on campus was loathed by students, with example after example supporting the 1 out of 5 rating he accumulated. Yet he was the longest-tenured professor in the department (and also a union representative).

It has applied to my teaching experiences a few times, but only in an Adjunct capacity. In fact, the only place I ever saw evaluations correlated to Full/Part time faculty was within a For-Profit university context. Otherwise I am not sure that evaluations do indeed impact faculty tenure, promotion, or employ-ability. Certainly not at every (or even most) schools.

Here, we run into one of our first problems. This is very difficult to verify or review in the literature. The first roadblock is the sheer number of journals that are unavailable to most people, which may (although likely do not) shed light on these issues. (But the lack of transparency in research is another blog topic.) So I use my educator credential through the library only to find a lot of abstracts with very few full articles. Sigh.

The second problem in this vein is the poor quality of research. My good friend Dr. Gordon Sanson - the world’s preeminent expert on teeth - was both the head of Biology at Monash University and eventually the Director of a Teaching and Learning Excellence unit. So, he was amazed (and disappointed) when he turned his focus from science research to education research as the models, studies, analysis, and conclusions were often so flawed. “The majority of papers that get published, even in serious journals, are pretty sloppy." said John Ioannidis, professor of medicine at Stanford University, who specializes in the study of scientific studies and published the now infamous article, “Why Most Published Research Findings Are False.” So, as you look at the studies surrounding End-Of-Course-Evaluations, particularly by students, while you do find some consistency with attractiveness-bias (people almost always evaluate others higher if they find them attractive), there is little else that seems to be meaningful or corroborated. There are simply too many variables from institution to institution, with very little done to address the issues that would cause problematic findings.

Another reason I am doubtful that many institutions use student surveys to rate, promote, or fire instructors are the myriad of arguments against it. While I have no doubt that some institutions might use student evaluations to get rid of a professor they wanted gone anyway, there are a number of schools where policies prohibit such actions. At the same time, whether petty and defensive or appropriate and thoughtful, arguments abound, seeing a large majority of professors suspicious of, if not berating student evaluations in general.

One solid argument is that people (not just students) are poor judges of evaluating learning as one does not know what one does not know. However, this also asserts that student evaluations are solely about learning, not about the experience, relatability, collaboration, respect, etc.

On the petty side, the argument is made that teachers give higher grades so as to receive better evaluations. Going back to the research again, this is definitely not supported conclusively. A few studies have suggested this is true (although their research methods and/or their sample sizes are highly questionable) while a few studies have suggested this is not true at all. There is a statistical stand-off demanding better, bigger, more longitudinal research.

In a not-statistical-whatsoever way, (an N=1) I will use a personal example here. I have aggregated my teacher evaluations over my lifetime. At the same time, I track the grades I give every term. I can report that my evaluations are in the top 3% (based on comparison to colleagues at the universities where I have had access to see that information) while my grade distribution is a full letter grade lower than the mean. Again, I never correlated grades with end-of-course surveys, so perhaps there is instructor bias at work more than student bias, but I can report grades do not have to correlate to evaluation results. (Note - if you know me, then you know my evaluations over time have nothing to do with attractiveness bias, etc. I employ the most effective techniques from learning science as often as possible, and I am convinced that makes all the difference…)

So Then What?

If student surveys are that problematic, perhaps we should seek another way? While it feels wrong to tell consumers of a product or service that their opinion doesn’t matter, there are other ways to view teaching effectiveness. The second most common method of evaluating instruction across higher education is the peer review. A Chair and/or peer instructor can evaluate a professor to determine efficacy. But to take this lightly would be a huge mistake.

Having taught at or worked as an administrator for multiple schools, I can tell you first hand how political and sensitive evaluation conversations can be. I have seen Faculty Senates and Teacher’s Unions alike create rules, clauses, processes, and exceptions around evaluations which render the intended evaluation useless. After all, this is a collective group that has never had to face scrutiny until the last few decades! At the same time, I have heard passionate speeches and read well-crafted responses detailing how impossible it is to “judge” another teacher. “Teaching is about style and style is different from person to person, therefore good teaching cannot be standardized, nor can it be effectively evaluated by someone else,” are the kinds of arguments I have heard.

Add to all of that the “buddy system” that I have personally seen with peer reviews. Friend 1 evaluates Friend 2 and vice-versa, ensuring that both get a great review, with only the time it takes to write the report as a drain.

At the same time, don’t forget to ask the instructor for the best possible class, date, and time to evaluate them. Allowing them to pick from their ‘best’ students, with plenty of lead-time to pick the most active instruction (whether the norm or an anomaly) will push the odds in favor of a great review. You can see the logistical issues faced from the start.

But maybe your institution has blind peer review. Nice start! Of course, there are other issues at play too. What if the faculty observers have only ever experienced lecture? That is not uncommon, yet we know from decades of replicated research that lecture is rarely the best approach to classroom instruction and is typically not a good way to ensure learning for students. But if this is the only method known by the observer, then any lecture will likely be deemed as fine or appropriate. Likewise, any method that is active in nature may be so foreign that the observer does not know how to evaluate that. (Student evaluations encounter the exact same issue.)

Similarly, if there is overlap in discipline, having one expert evaluate another expert is quite difficult. Remember, the class is full of novices, not experts. The evaluator must recognize the struggles a novice will have as part of the instruction measurement.

So, the evaluation system now requires three important things to be meaningful. First, the evaluation must include objective, observable, learning-centric teaching attributes. Obviously these should be valid measures from literature (like “How People Learn” and/or “Make It Stick,” etc.). Second, this requires a healthy amount of time for observers to fully grasp what to look for, how to measure it, extrapolation of the tangential concepts which support it, etc. Training is required and it should likely be viewed as Professional Development and Enrichment. Finally, this requires that learning outcomes be a part of the measure. Did students learn? (Note - this should not be grade based, but must be learning outcome based. I might get 10 points removed from a paper because I submitted it late, but that does not mean I did not demonstrate learning.) So it is important to note whether students learn in a professor’s classroom or not.

Other Options

There are other ways by which to evaluate teaching performance. The most obvious is likely the portfolio. The problem there is similar to the problem with accreditation. It can easily become a joke.

You know what I mean. When was the last time an Accreditor “secret shopped” one of their schools? When was the last time you heard of an Accreditor pulling a “surprise inspection” of a college or university? No, that isn’t how it works. Instead, they give you copious amounts of paperwork to fill out years in advance of the next milestone. During that time a college or university cherry-picks the best and brightest examples of student work, teacher work, initiative findings, etc. They also create assets to “prove” how impressive their offerings are. They provide test and grade statistics (which we know to be extremely poor proxy’s for learning) and then they warn, well in advance, of the specific sites or areas they plan to visit during the episodic nuisance.

Portfolios can absolutely have that feel, no? Unless the rubric for the portfolio encourages assets other than the best of the best, then every portfolio will be an incredible showcase of talent. It is also important to note how much time a teaching portfolio takes.

Data is another interesting addition to the mix. Personally, I am very much for “big data” as part of teacher evaluation. But of course the data has to be clearly identified, readily curated, difficult to massage, and holistic. Should an institution see a red flag in a professor that gives out an F every term? Of course not. But I had access to data at an institution where I saw a professor who, over several years, never gave a B or a D. The instructor gave a lot of A’s, a handful of F’s, and the rest as C’s. How is that possible? That is a piece of data that should have elicited some conversations with his/her Chair, no?

Systems Thinking

This entire blog has been a setup of sorts. I hope you found some help with evaluations as the problem should not remain persistent. In fact, a leader in this space I value a great deal is Carl Wieman. “Why Not Try a Scientific Approach to Science Education?” by Carl Wieman, a Nobel laureate, asked science professors around the globe to start using active learning strategies, staying away from “passive absorption” tactics. Ten years later, he noted, “a small fraction of classes” had moved away from lecture as the primary teaching methodology. This led to more work and support in the teaching and learning space. Wieman would ask instructors, “How can you justify the use of lectures in light of solid research showing that this isn’t a very effective way to even get students to retain information, let alone understand concepts?” But specifically, Wieman has a full report on this that is chock full of good ideas: A Better Way To Evaluate Undergraduate Teaching.

Moreover, I want to challenge higher education leaders here. I’ve blogged before about Peter Senge’s book, “The Fifth Discipline” and its convincing argument for Systems Thinking. If you really want to see a lack of systems thinking across the sector of higher education, look no further than the evaluation of instruction, teaching, and/or learning.

  • Teacher evaluation requires objective, evidence-based, rigorously tested measures. That means having read, curated, and analyzed dozens (if not hundreds) of books and articles on the subject.

  • Teacher evaluation likely requires a healthy amount of development time and dollars for observers. At the same time, think about deficiencies which are identified in the observations. What are the professional development opportunities for instructors after the evaluations are performed? If left to the discretion of the individual professor, how are they juggling subject matter expertise and enrichment while also developing the craft of teaching?

  • Teacher evaluation requires a deft political hand, working across stakeholders to ensure the betterment of both students and teachers, versus a heavy-handed, battering-ram approach.

  • Teacher evaluation requires a curriculum map of learning outcomes as well as authentic assessment for those outcomes so as to determine whether learning actually occurred or not.

  • Teacher evaluation requires a strategic and intentional division between research and instruction. One should not blur the other as weights should be appropriate for the institutional context, all with student learning in mind.

  • Teacher evaluation requires institutional data be monitored, leveraged, shared, while being kept safe.

  • Teacher evaluation requires a consistent cadence of measurement. (Observing a teacher once per year OR LESS is hugely inappropriate.)

  • Teacher evaluation requires a distinct separation from course evaluation, regardless of modality.

  • Teacher evaluation requires fairness across all being evaluated, including if/how those measures are used for incentives, appraisal, or reprimand.

There are other considerations, but you get the idea. If you look at who “owns” all of the things above, you will be lucky to find less than 5 different stakeholders, likely at very different levels of the organization, and likely people who never really discuss this with the others. That is not systems thinking (nor is it even effective). Yet that is likely how 99% of institution’s operate.

At the end of the day, if you want to “fix” something in higher ed, whether teacher evaluations or something else, you need to employ systems thinking. Sure, a new rubric or a sub-committee working on some aspect or a new Director with a new initiative may be able to quell a symptom or two. But “fixing” the problems of higher education take more than symptom considerations.

I warned you from the start that this blog would be “meaty.” But I also hope it gave you some real food for thought

Good luck and good learning.

Jeff Borden