A Proposal: A Scoring System For Roast Quality

Coffee professionals almost universally use a 100-point scoring system to evaluate green coffee. The system is silly in some ways; for instance, in specialty it’s really a 20-point system, and realistically, readers of this blog probably drink coffees in the 82-90 point range 90% of the time. So if you taste a cup of specialty coffee and it’s “great” you could probably assume it’s in the 88-90 range; “good” puts you in the 85-87 range; and “meh” puts you in the range of 82-84.

Many of us have had the experience of filling out a standard SCA scoring sheet while cupping, and finding that our cumulative scores are much lower than what most pros would rate the coffee (and always lower than what the broker selling the green claims it is. Hehe.)  That’s because, in a sense, the sheet is designed to nudge you toward conformity, usually in the mid-80s. It’s also because the quality of a coffee is not merely the sum of its parts in whatever way the form weights those parts.

I’m not an expert on scoring green, but my friend Ryan Brown, author of the forthcoming book Dear Coffee Buyer: A Guide to Sourcing Green Coffee, is.  In the book, Ryan writes:


The first time I scored a bunch of coffees on a form, I was a judge in the 2008 Best of Panama, which used a version of the SCA form. I had a working mental model of how scores like 82, 86, and 90 tasted, but I had never before needed to derive the total scores by adding the scores of individual attributes. I began by evaluating the fragrance, aroma, and sweetness, penalized points for defects, and ended up with a score that didn’t at all sound right. This was a nice, clean, and sweet coffee: 79. This coffee was all right; it seemed a little inconsistent and the acidity fell apart as it cooled: 87. I furiously found places to push the numbers up or down to more accurately land on what seemed to be a fitting score. It involved a lot of erasing and arithmetic.

The second time I scored a bunch of coffees on a form, I was a judge in the 2008 El Salvador CoE, where I did the opposite. The CoE form has eight scoring categories of 8 points each. (A coffee begins with 36 points, provided there are no defective cups in the sample.) Starting with the final score, I then frantically figured out what attributes added up to that number. It was better, but my scoring still involved some unpleasant fudging.

I eventually hit my stride on forms. I found what the average score for each component would be to yield an 86, which, throughout my coffee buying time, was a sort of benchmark. I then asked myself, for each attribute, is this an 86-like sweetness? Higher? Lower? Is this an 86-like acidity? An 86-like body? Et cetera. This worked surprisingly well, and I only ever found my final score to be at most about a half-point different from my initial gut reaction.


 Available soon at www.scottrao.com

Available soon at www.scottrao.com


I don’t want to dwell on the quirks or limitations of the industry’s standard scoring forms; they serve their purpose well enough, and coffee professionals seem, as a group, to have figured out how to agree on green-coffee scores.  However, roasters and baristas often use these sheets to evaluate production roasts, a purpose for which the forms are not intended or well designed.

Around the same time Ryan was judging the Best of Panama coffees, I had my first significant revelation about cupping forms and evaluating roasts.  I was, coincidentally, at a “Best of Panama” cupping hosted by a roastery in Seattle, and I was having difficulty rating some cups because the sample roasts were all over the map.  Some roasts were spot on, some terribly baked, and some underdeveloped. Being someone more tuned into roast quality than green quality, these roasts threw me. Others in the room were doing a better job of “seeing through” the roasts but I was also pretty sure some of their scores were being influenced by roast quality in ways they did not realize.

I have no doubt that great — and terrible — roasts sway professionals by 1-2 points here and there.  I don’t mean to confuse a point that Ryan made to me after he read an earlier draft of this post: when it comes to buying green, a coffee’s “true” cupping score should be the best one you can achieve with it at a given point in time.  I would argue, and I think Ryan would agree, that the influence of extraction, water, and roasting can only deduct from a coffee’s “true” score. That said, there’s no way to know with certainty when the extraction, water, roasting, etc, have all been optimized to help illuminate a coffee’s true score.

For example, that massively sweet fruit bomb on the table may get a score of 86 instead of 88 if the sample roast is very baked, because baking kills sweetness. The sample that retained its acidity and sweetness as the cup cooled may not be better than its neighbor that “fell apart” after cooling; the second cup may have simply been baked. “Brothy” underdeveloped roasts often don’t bring out the plump, pleasing, and juicy notes that some coffees potentially offer.  Finally, I’ve had brews that tasted minerally, metallic, or gamey when made with water unsuitable for coffee extraction.

Regardless of one’s views on how much roast quality sways professionals in green evaluation (and to be clear, we’re all swayed to different degrees), it’s clear we’d be better off with a separate system for evaluating roasts if we want to get better at roasting and also want to avoid allowing roast quality to interfere with green evaluation.

Let’s say you production-roast a 90-point Kenya and the ROR crashes hard and then flicks. The coffee may still have some pleasant fruit since, after all, the coffee’s raw material was so stellar that you’d have to roast the coffee on a BBQ to destroy all of its goodness. You may score a cup from that roast as an 88.  Now let’s say you execute a flawless roast of an 83-point blend component; you will likely still score that coffee an 83. As a production roaster, there isn’t much to learn from the Kenya scoring five points higher than the blend component. After all, you were handed two radically different types of raw material, did a masterful job with one of them, and a poor job with the other, yet the better job scored lower.

If you’ve read my two previous posts about seeing through roasts, part 1 and part 2 you’re familiar with my opinion that some cup traits are primarily due to roast, some are due to green quality, and most are a combination of the two. “Seeing through roasts” is more art than science.  I propose that an effective roast-scoring system focus only on traits that are predominantly indicative of roast quality.

To that end, here’s a first draft of a proposed roast-scoring system.


Development (as in “how well developed are the bean cores,” not “how dark is the roast”): -2 to 0

-2: objectively underdeveloped

-1: debatably underdeveloped

0: adequately developed (according to your standards)


Success in highlighting terroir or other roast goals: -2 to 0

-2: failure to highlight desired flavors

-1: partial success

0: complete success in highlighting terroir, etc.


Baked: -2 to 0

-2: definitely baked, hollow, lost most sweetness

-1: a little baked but still kinda sweet

0: not baked at all


Roasty: (not a real word): -2 to 0

-2: reminds you of your last cup of Starbucks

-1: hint of char, smoke, or unintended roastyness (also not a word)

0: would please George Howell


Admittedly, you may find it awkward that scores will usually be negative, or at best zero.  I’m personally okay with that, in the sense that roasting isn’t capable of improving green coffee.  A roast will either show a coffee’s full potential (the score would be zero) or cause the coffee to fall short of its potential (the score would be negative.)  As Ryan wrote in DCB: “No matter what wizardry your roasting team is capable of, there's no way to improve a coffee's potential through roasting. Green coffee either has it or it doesn't.”  


Please note that it would not be a good idea to subtract a coffee’s roasting score from its green score, as that will cause confusion and misses the point about keeping the green and roast evaluations separate.


Here are some examples of applying this system:


Coffee #1: fast roast, no ROR crash or flick, tastes green

Very underdeveloped (brothy): -2

Some varietal character shines through: -1

Not baked at all: 0

Not roasty at all: 0


Total Score: -3


Coffee #2: developed, but the ROR crashed hard and flicked

Development is great: 0

Terroir shines through a little: -1

Very baked: -2

Very roasty: -2


Total Score: -5


Coffee #3: almost perfect, but development was borderline

Development: -1

Terroir: 0

Baked: 0

Roasty: 0


Total Score: -1


I’d like to be clear that this proposal is a work in progress, and I hope readers will criticize this post constructively, and offer their own suggestions for improving the system.  

 photo credit: Liz Clayton

photo credit: Liz Clayton