On Teaching, Yet Again (Part 1)

I cannot wait for this semester to end; it’s mostly because a couple of major service obligations will end with it, so I will no longer have to deal with some very difficult people. There are people who, once they’ve grabbed onto some power, develop — or perhaps just give themselves the permission to manifest? — disrespect towards their colleagues that is both staggering and frightening. And how easily some other people will roll over in the face of bullying by someone they perceive higher in the hierarchy is just nauseating. One reason we have tenure is so we wouldn’t have to tolerate being bullied by the administration, FFS.

But this was me having my tiny ranty vent. Or is it a venty rant?


I mostly wanted to talk about teaching, and what it means to teach well.

I am a very good teacher. How do I know? I get great student evaluations, I have very high attendance in all of my lectures, and even though I really challenge the students — trust me, I make them work really, really hard — they rise to the challenge. I have nearly twice the enrollment in my courses than the colleagues who teach the same class. I have also heard from multiple sources that students wait for when I teach a class to come and take it, and a number of students take 2-3 classes with me. (/brag over)

A few weeks ago I heard, yet again, the annoying assertion that high student evaluations don’t mean that you are a good teacher, and that it means that you are just entertaining your students and that you are an easy grader. I resent this implication, and honestly, that sounds like sour grapes: if it were that easy to get high student evaluations, everyone would get them. But it’s not easy, and students are not stupid. Maybe the following depends on the school, but I teach at a public school and the students in my classes are for the most part not spoiled, lazy, or entitled. Most are here to learn, and they appreciate being taught well. They also appreciate a professor who takes the time to get to know them, who has a clear schedule of assignments and exams, returns graded exams promptly, has enough contact hours, and who generally shows that he/she cares about student success.

There is a lot of research showing that student evaluations of teaching aren’t a very good predictor of teaching effectiveness. Student evaluation also tend to show bias against female instructors. (I believe these studies exist, but I don’t have links. If anyone has links, please leave them in the comments.) However, the last few times when this came up, whenever I asked the person who advocated for abolishing teaching evaluations for how we should measure teaching effectiveness instead, there was no definite answer. People said exit surveys, evaluations after follow-on classes, etc., but nothing really that would produce a quantitative metric. Student evaluations are not the only thing we submit for tenure here, there are also reviews of teaching by senior colleagues, and other documents in the tenure dossier that can put a candidate’s performance in context (e.g., compare to others teaching the same type of course). At least here, it’s not like the evaluations are the only piece of information we look at.

In the language of mathematical logic, we seem to want equivalence between teaching effectiveness and some quantitative metric, but we really just have an implication. (A–>B is true, but B–>A (the same as !A–>!B) is not necessarily true, and thus A<==>B is not true).

The relationship between evaluations and teaching is similar to the relationship between the h-index and research excellence. A person with a high h-index is probably making an impact on his or her research field; that doesn’t mean that the person with a lower h-index isn’t. Similarly, a person with high teaching evaluations is likely a good teacher; that doesn’t mean that one with lower evaluations isn’t. Also, there is such a thing as an h-index that is too low (for a given field and candidate seniority) and there is such a thing as teaching evaluations that are too low.

I don’t think quantitative metrics are evil. They don’t mean everything, but they do mean something.


There is a junior faculty member who is struggling with teaching some lower-level large-enrollment courses. His teaching evaluations are quite low. I visited his class a few times, as we require for tenure, and I am not surprised by evaluations at all. I could have predicted his scores for last semester based on just sitting in one of his classes. I gave him feedback after that class, but I don’t think I was blunt enough.

We all wish to be teaching only the students who are highly motivated and interested in the subject; this is your typical upper-level electives or graduate course demographic. However, the students who already come interested are easy to teach; you just have to know the material, and even if all you do is transmit the information passably, they will learn, they will feel great about learning, and your evaluations will be great, too.

However, that’s not how it works. You get whom you get. In large-enrollment, lower-level required courses, many students don’t want to be there. Many are unprepared. It is very easy to lose and never recover swaths of your audience. That’s where you see a difference between really good teachers and everyone else.

You don’t get to choose the students you get; you have to find a way to teach the students you actually have in your class.

In order to teach, you have to be able to connect with your students. This is paramount in getting them to come to class. And, for some faculty, at least among my colleagues, it is hard to connect with students because they cannot get over what really boils down to a level of disdain — that the people in the class are not bright enough or worthy enough, or else they would understand the teacher’s awesomeness or the supposedly inherent awesomeness of the course material.

Teaching well requires a level of empathy: to be able to put yourself in the students’ shoes, to try to see the material and yourself from their perspective. And their perspective may not be the perspective that you ever had yourself, because most students are neither as talented for nor as interested in the field in which you got your advanced degree as you are. The teachers who make jokes in class or bring props and demos are all trying to do that — connect with a novice learner who might be quite different from them.

You need to figure out what it is that they need from you. And the more abstract the concepts are, the more important it is to come up with good examples that hopefully translate to the real world. And you don’t have to give them the full mathematical artillery the first time around. At first exposure, lead with intuition and follow with the formalism.

*** to be continued (blogger got too sleepy) ***


  1. Student evaluations are biased, not always accurate or precise, and a highly incomplete picture of what’s going on. That’s different from saying that they are worthless.

    I maintain that all of these problems are addressable.

    Accuracy and precision: Don’t over-use these scores, but that’s different from ignoring the scores. If somebody is consistently high up in the department’s rankings, that probably means something good, and if somebody is consistently on the low end that probably means something bad. If they are in the broad middle, they’re probably OK. If you want to go beyond sorting into a few basic tiers, bring in additional info.

    Incomplete information: Don’t make them your only information source. Nobody makes a hiring decision based solely on rec letters, because we also CVs/resumes, interviews, etc. Nobody makes an admissions decision based solely on a transcript because we also have essays, rec letters, etc. Nobody buys a home based solely on price and exterior appearance because we also see the interior, get an inspection, check out the neighborhood, etc. But nobody completely ignores available info either.

    Bias: Simply apply a correction. Do men tend to get higher evaluations than women even when all other available information shows similar performance? Then apply a correction factor, an offset or calibration, that takes that into account. If a woman’s scores are in the gray area, give a benefit of the doubt. If they’re solidly and consistently in the good (or bad category) then that probably means something.

  2. The questions on the evals are important too. Would you recommend this class is not as helpful (for a required core that nobody wants to take) as did the professor come to class prepared.

    We’re currently in a bit of a quandary because we are having the problem that one of our likable male junior profs who only gives As isn’t teaching much content, only has group grades, spends 30 min at the beginning of each class checking homework if he assigns homework which is a new thing this year, and gets extremely high teaching evals. (A small number of students complain about the lack of meat to his classes, but that doesn’t show up in the evals and students who want to learn select out of his sections.) We also have teaching observations.

  3. I think Alex’s comment is eminently sane, and I want to pin it to the office doors of all the senior faculty / administration at my university. Our university relies essentially exclusively on student evaluations to evaluate teaching effectiveness and I believe at least partly as a result, has not had an uncontroversial tenure case for a woman in the “hard” sciences (excluding biology and psych, where women are not underrepresented at the PhD level) in about a decade. I am not self-interested in this — I have very good teaching evaluations (the highest in my department, according to my chair), but I am just livid when I see examples of junior women in other departments (often the “hard” science departments) who get simply attacked by their large intro classes of uninterested premeds. There are some explicitly sexist comments, but I don’t believe that simply discounting those is enough — this study, for example, gender-swapped a male and female teacher in a series of online courses and showed that ratings based on measurable data, like how quickly a professor returned graded assignments, were lower when the students believed the instructor was a woman (whether or not she was):
    There’s also this one:

    As for other metrics: yes, clearly peer evaluations. How about learning gains for science courses??? There are standardized metrics used by science education researchers! Also letters from former students (down the road, students often have a different perception of a course). Multiple metrics seems like a no-brainer to me, but alas not to my university.

  4. Keep in mind, though, that my “eminently sane” comments are not just aimed at those who want to over-use student evals; they’re also aimed at those who want to toss them out completely. Middle ground is something that Americans struggle with.

  5. High evaluation scores don’t necessarily mean you’re an easy grader. But being an easy grader is certainly one way to get high evaluations. (Generally speaking here — I’m sure you’re great, xyk!). As I recall, those studies that show bias against female/nonwhite/foreign/nontenured instructors, and large/intro/nonmajor courses, also show that expectation of grade is a good predictor for evaluation scores, and the most controllable.

    I know excellent teaching exists, much as you describe it. But all the metrics are flawed and can be gamed. My department has some brilliant instructors, and some absolute turkeys, and we all look wonderful on paper.

    So what’s the alternative? The comments on student evaluations can be very helpful. You can usually tell if students like a course because it’s easy. Peer evaluations, in my experience, are often political and thus fraught. We’ve toyed with exit exams, but there are logistical problems. How do we get students to take it? Who decides on the material covered? Who grades it? Jfc, do we really want another scantron exam? And then the bigger questions: is there a will within the department to acknowledge and deal with bad instructors, or will it just cause problems with the dean? Are we secure enough to acknowledge the brilliant instructors, or do we worry that they make the rest of us look lame?

    Full disclosure: I’d say I’m above average, but hardly brilliant.

  6. Alex, understood and agreed — I’m not arguing for tossing out teaching evaluations. I like the idea of applying offsets — I’ve heard Yale has started doing that. I just get angry when there’s clear evidence of gender bias in the evaluations, and then my university evaluates teaching solely on the basis of student evaluations, and my minoritized friends get the boot. I’m simply arguing for multiple metrics — they all have bias of one sort of another, but at least there might be different sorts of bias and at least you get more data (I’m basically always in favor of more data).

    Another metric I forgot to mention above is collecting data on grades in upper-level courses as a way of evaluating intro-level course teaching. While there might be all sorts of factors affecting an individual student’s performance in upper-level courses, I believe there are data out there showing that learning gains at least somewhat track with trends in grades between intro- and upper-level courses.

  7. PS – I’m ignoring the dig at how Americans aren’t good at middle ground. There are lots of people in the world who aren’t good at middle ground, including some Americans and also lots of Europeans and people of other nationalities I could name (a few specific Chileans come to mind…). 🙂

  8. I’m an American, and I’m amazed at how hard it is to get my highly-educated colleagues to not think in dichotomies. As far as I’m concerned, our anti-intellectualism broke the brains of far too many people. 🙂

  9. The purpose of teaching evaluations SHOULD be to improve teaching and that means that the audience SHOULD be the instructor. The generic teaching evaluation is useless for this. You need to build an evaluation that asks such things as “Should more/less time be used for doing problems”, etc. There should be lots of room for extended responses.

    For example a few question from my teaching evaluation each with a five point Linkert scale

    The lectures each have a clear theme
    I could ask questions during the lecture when I needed to
    More of the lecture should be done on the blackboard and less from the Powerpoint slides
    The videos shown were useful for learning.
    There were enough examples worked out during the lecture for me to understand how to do the problems.

    Any other comments you may have about the lecture

  10. I found this paper while looking thru the ones linked by lyra42 above – http://faculty.econ.ucdavis.edu/faculty/scarrell/profqual2.pdf

    “Student evaluations are positively correlated with contemporaneous
    professor value-added and negatively correlated with follow-on student
    achievement. That is, students appear to reward higher grades in the
    introductory course but punish professors who increase deep learning
    (introductory course professor value-added in follow-on courses).”

    I don’t know how true this is generally, but if it is general it’s sort of terrifying…

  11. I read this interesting paper a while ago on a large scale study of student evaluations in STEM fields, which uses cool natural language processing methods to do the analysis: https://aclweb.org/anthology/C/C16/C16-1083.pdf

    The main finding was that “while the gender of the evaluated instructor does not seem to affect students’ expressed level of overall satisfaction with their instruction, it does strongly influence the language that they use to describe their instructors and their experience in class.”

    — while I’m commenting, I want to take this opportunity to say thank you xyk for this awesome blog!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s