Two of the courses I teach, AP Calculus AB and IB Mathematics SL year two, have clear curricula to follow, which is both a blessing an a curse. While I primarily report standards based grades in these courses, I have also included a unit exam component that measures comprehensive performance as well. These are old fashioned summative assessments that I haven’t felt comfortable expelling from these particular courses. Both courses end with a comprehensive exam in May. The scores on these exams will be scaled either to a 1 – 5 (AP) or a 1 – 7 (IB). The longer I have taught, the more I have grown to like the idea of reporting grades as one of a limited set of discrete scores.
Over my entire teaching career I have worked within systems that report grades as a percentage, usually to two digit precision. Sometimes these grades are mapped to an A-F scale, but students and parents tend not to pay attention to those. One downside to the percentage reporting system is that it implies that we have measured learning to within a single percentage point. Let’s leave out the idea that we should be measuring learning numerically at all for the moment, and talk about why the idea of discrete grades is a better choice.
As a teacher, I need to make sure that I grade assignments consistently across a course, or across a section at a minimum. I’m not sure I can be consistent within a percentage point when you consider the number of my students multiplied by the number of assessment items I give them. I’m likely consistent within five percent, and very likely consistent within ten. I am also confident in my ability to have a conversation with any student about what he or she can do to improve because of the standards based component of my grading system.
One big problem I see with grading scales that map to letter grades is the arbitrary mapping between multiples of ten and the letter grades themselves. As I mentioned before, many don’t pay attention to the letter at all when the number is next to it. Students that see a score of 79 wonder what one thing they should have done on the assessment to be bumped up by a percentage point to get an 80, resulting in a letter grade of a B. That one point also becomes that much more consequential than a single point raising a 75 to a 76.
Another issue comes from the imprecise definition of the points for each question. Is that single point increase a result of a sign error or a conceptual issue that is more significant? The single digit precision suggests that we can talk about things this accurately, but it is not common to plan assessments in such a way that these differences are clearly identified. I know I don’t have psychometricians on staff.
For all of these reasons and more, I’ve been experimenting with grading exams in a way that acknowledges this imprecision and attempts to deal with it appropriately.
The simplest way I did this was with final exams for my Precalculus course last year. In this case, all scores were reported after being rounded to the nearest three percentage points. This meant that student scores were rounded roughly to the divisions of the letter grades for plus, regular, or minus (e.g. B-/B/B+).
In the AP and IB courses, this process was more involved. I decided that exam scores would be 97, 93, 85, 75, and 65 which would map to 5-4-3-2-1 for AP and 7-6-5-4-3 for IB. I entered student performance on each question into a spreadsheet. Sometimes before, and sometimes after, I would also go through each question and decide what sort of representative mistakes I would expect a 5 student to make, a 4 student, and so on. I would also do a couple different scenarios of scoring at each level to find how much variation in points might result in a given score. That led me to decide on which cut scores should apply, or at least would suggest what they might be for this particular exam. Here is an example of what this looks like:
At this point I would also look at individual papers again, identify holistically which score I thought the student should earn, and then compared their raw scores to the scores of the representative papers. If there was any clear discrepancy, this would lead to a change in the cut scores. Once I thought most students were graded appropriately, I added the scores into a Google script to scale all of the scores to the discrete scores.
This process of norming the papers took time, but it always felt worth it in the end. I felt comfortable talking to students about their scores and the work that qualified them for that score. The independence of these totals from the standard 90/80/70/60 mapping between percentages and letter grades meant that the scores were appropriate indicators of how they did, regardless of the percentages of points. Students weren’t excited to know that they couldn’t figure out their total point percentage and know their score, but this was not a major issue for them. Going through this process felt much more appropriate than applying a 10*sqrt(score) type of mapping to the raw scores.
In my end of semester feedback, some students reported their frustration that they would receive the same score as other students that earned fewer points. I understand this frustration in principle, but not in practice. The scores 92.44% and 91.56% also receive the same score under the standard system by rounding to the nearest percentage. I think in the big picture, the grades students received were fair, and students have also reported a feeling of fairness with respect to the grades I give them.
I’m in favor of eliminating the plus and minus designations from letter grades. They are communication marks and nothing more, and I would rather communicate those distinctions through written comments or in person rather than by a symbol. These marks are more numerical consequences of the percentage grade scale than they are intentional comments on student learning, and they do more harm than good.