Tuesday, June 10, 2008

Great News for Chess Grades?

-- Or --

Putting the Grrrrrr into Grading?


You know how your grade is calculated, right? If you beat a player, then into your average for the next year goes their grade plus fifty; if you draw, just their grade; if you lose, their grade minus fifty: and then all your results are averaged. Of course there's also the forty-point rule - this deals with results between players separated by more than forty points in order to make illogicalities like losing grading points from victories impossible. And of course if you don't play thirty games in your season, then some of your previous results go into to your average as well, until it tots up to thirty, if possible. And that's it, right? How it always has been, how it always will be, grade after grade, list after list, season after season, decade after decade? Right?

WRONG! Because next year, you'll have two grades. The first will be your normal grade, calculated as above. That will be your official grade for the season. The second will be your corrected grade, which is (approximately) your normal grade multiplied by 0.8, with 50 added on to it. But as of 2009-10, this corrected grade will form the basis of all new gradings.

Confused?

Wondering why?

Then read on. All will be revealed, as best I can . . .

So, here's the story. A little while back the English Chess Federation (ECF) commissioned research into whether or not their grading system suffered inflation or deflation. The statisticians who worked on this concluded that the lower the grade, the more deflated it was. After queries, double-checking, and much discussion, the ECF and their statistics team then worked out a formula to correct the deflation. That's the formula I gave above, approximately. As to the reason why the corrected grades are visible this year - but will be only be operationalized next year - that's simple. This way tournament controllers and league secretaries and the like can work out what their new grading boundaries will have to be next year - with a whole year to study the new corrected gradings and their implications. In other words, we'll all have a whole year to get used to this new lay of the land.

So, what does it mean? Well, if your grade next year is 100, your corrected grade will be 130 - a thirty point jump. But if your grade next year is 200, your corrected will be 210 - only a ten point jump. That's because the lower the grade, the more deflated. In fact if your grade next year is 250, then your corrected grade will still be exactly 250.

Now personally speaking, I'm no statistician, and I'm not going to argue with the experts on this, nor pretend I can refute all their research from my desk. In fact, my personal experience tends to confirm this kind of thing. My average grade this year against players rated lower than me is 22 points lower than my average grade against players rated higher than me. This confirms what the ECF says. And I remember more than a decade ago watching a game between two players rated in the 120s that started 1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 Nc6 6. Bd3?? Nxd4, after which white went on to win in about twenty more moves - without having batted an eyelid at his knight that had dropped off the board on move 6, let alone it seemed having contemplated resignation over it. But nowadays, I can't remember the last time I saw a 100 graded player make so crass a blunder. I've played several players around that grading level this year, and I consider them all decent club players. In fact two of them drew against me, and a third should have. Finally, on a personal note, I've heard several experienced and strong players - players whose opinion I respect far more than my own - say they think grades are increasingly deflated.

But not everybody is so convinced. Tim Spanton is arguing on the ECF forum, in fact, that a lot of people he knows believe the ECF are doing this for a different reason. "The suspicion," he writes, "is that this is a gimmick to raise grades for no reason other than stroking the fragile egos of people who can't stand seeing their grades going down (and aren't prepared to put in the hard work necessary to reverse such a process)." Well, what do you think? That this is a great move for grading? Or unnecessary meddling - just the ECF putting the Grrrrrrr into Grading?

(And, by the way, if you feel like joining in the debate over at the ECF, then their grading forum can be found by clicking here.)

32 comments:

Anonymous said...

OK. I understand that, in general, higher-graded players have performed less well against lower-graded players than the 50-point formula expects... and therefore that adjustments are desirable, with the lower-graded players gaining relative to higher-rate players. BUT, to have confidence in the adjustments (and, I suppose, because I like to understand detail), I'd like to know more about how they’ll be calculated. Has this information been published anywhere?

Also, what has caused grades to get out-of-kilter? Can this be addressed in tandem with the adjustments (or will further adjustments be necessary in future years)?

Angus

Tom Chivers said...

There is I am told a two page article about all this in the latest ECF Yearbook. Unfortunately, I don't know how to get hold of the Yearbook, and my email to Chris Majer (current ECF Chief Executive) asking if an electronic copy is available has gone unanswered. There is not one on the web I can find.

However, the matter has been debated at length in the ECF Forum. I don't understand it all, but my impression is as follows.

1. That the correction-formula is based on comparing statistically-expected results with actual results. Eg, over all games between players graded 150 and players grade 125, the players graded 150 should score 0.75. Under the corrected system they (on average) will. Under the old, they didn't. I am not wholly sure about this, but if you go through the threads on the ECF forum you can find out more. There are graphs and so forth from the statisticians who find this stuff out, for instance.

2. What has caused the grades to get out of kilter is, I think, players whose grades are rising rapidly - juniors especially. Say your true strength is 200, but at the start of the season your grade is 100. You will beat nearly everyone you play, who should get a grade per game of 150 based on your true strength. Instead they will get a grade per game of 50.

3. As for further adjustments, I'm not sure. Possibilities of preventing/limiting this have been discussed on the forum, but I am not sure which if any will be implemented.

The main threads seem to be:

here and here. You will have to wade through a lot of discussion to pick out the key bits I suspect . . .

Hopefully when the new list is published, the ECF will also publish a full report on the change so people can understand it all, and then look further into the detail if they wish. It's a bit odd that the news is sort of seeping out, I think.

Anonymous said...

Here's my take on what happened. The ECF graders got a new toy about 5 years ago. This was a mechanism which appeared to calculate grades from first principles given a set of results. Initially this was used to calculate the starting grade for new players. New player in this context includes GMs turning up at Hastings / 4NCL as well as a 10 year old. This replaced the old system of the grader's guess ( if you saw a team with an ungraded player sitting between players of grades 120 and 100, you "estimated" them a grade of 110). Empirical evidence suggests that the back calculation gives lower results than the previous grader's estimates. This can have a knock on deflation effect as player improve.

Moving on, they had the bright idea of running the estimation calculation against the whole database. I've not seen it stated as to how they took into account players whose strength changed over the year, players with less than 30 games, 10 point additions for juniors or the 40 point rule - the model seems to have been trusted without rigorous evaluation. This isn't in itself about statistics - rather more about the validation of a computer model.

Having got a set of results - a set of graphs was drawn which showed a relatively poor fit between the modelled grades and the actual ones. A conclusion was then leaped to that this meant that the actual grading system had been deflating. This ignores the point that you could set the modelled grades to anything you wanted ( the no-change point was the 225 - 250 level - but you could have used the 125-150 level with as much justification). It also ignores the evidence from people who've been playing a while like Tim who hadn't noticed their grades dropping for no good reason.

As to whether Tim's point about egos is justified, I've noticed that the ECF has relaunched its "master points" scheme. This gives you a "Regional Master" title if you're above 180 for two seasons running and will have got easier if the grading corrections go through.

In one of the comments on the ECF site, Howard Grist - he's the man doing a lot of the work - observes that grading systems suffer from attenuation. What he means is that the gap between top and bottom widens over time. He uses the example of the top of the international list where grades of 2700+ are now commonplace. He could also have used the example of ICC blitz ratings where the top players are now around the 3200 mark whereas down at the 2000 mark, the ICC ratings still resemble the FIDE ones. So there's both inflation and deflation - top players go up - beginners go down.

Whether the ECF is right to damage the structure of rated restricted competitions in the supposed goal of statistical purity is an open question. We've probably got to wait for the actual results to be published and whatever back up justification comes with it. The actual level of the new grades is an arbitrary decision - it seems to have been to treat the top as a fixed point. It would not come as a complete shock for the October ECF council meeting to be dominated by this issue with motions on the table to either disregard the new grades entirely or to rebase them at a lower level.

At a philosophical level - are we asking too much of a grading system?
In the limit, all that it does is measure past perfomance. In the investment world there is a standard phrase that past performance is no guide to the future. In the chess world, we rather assume that a player with a grade of 150 will struggle against one with a grade of 175 and there's plenty of evidence to justify this. Is the same necessarily as true between a player of 50 and one of 75? Lets assume the player of 50 has taken a short break and read some books, done some tactical training or whatever. The 50 rated player may now be stronger than the 75 and the rating system's value as a predictor is devalued.


RdC

Tom Chivers said...

Thanks for your long comment RdC.

From the ECF Forum, I got the impression the conclusion about deflation wasn't leapt to at all, but reached tentatively and almost with regret. Why do you say otherwise?

Also, why do you say these changes will *damage* the structure of rated-restricted competitions? It will change them for sure, but I don't see any obvious downside. There are, however, obvious plus-sides. For instance, leagues where each team must be no more than a certain average grade will become more fair.

Btw, I've never been to an ECF Council meeting, but if they are representative of chess players I know then I predict very little resistance to this change. I guess that might well be a big 'if'.

Anonymous said...

Can't say I'm thriled with the change, but

a) I do feel 170s (for example) are stronger than they used to be, so it does make some sense

b) Hooray I'm back in the 200s!

PG

Anonymous said...

From the ECF Forum, I got the impression the conclusion about deflation wasn't leapt to at all, but reached tentatively and almost with regret. Why do you say otherwise?

An analysis of one year's set of results was used to "conclude" that inflation had been in the system for at least 15 years or even as far back as the 1972 match. If inflation had been present for that period of time you would be able to find many examples of players who were 175ish in 1994 who are 160ish now. Looking up the histories of people with A grades who you see at nearly every NCL, British or Hastings doesn't really throw up any examples. Tim S correctly makes the point that you need to do the CPD to maintain your relative position against IMs and that aging or relatively inactive players are the ones that lose their grades.
I don't really see any deflation at the 150 plus level - below that I think it probably dates from the introduction of the new estimation software.

Also, why do you say these changes will *damage* the structure of rated-restricted competitions? It will change them for sure, but I don't see any obvious downside.

Well every competition has to consider whether to change its rating structure and what to. For example locally we have a County under 180 competition between Hants/Bucks/Berks/Oxon. This would either lose players or have to change to u190/195/200.
Will the ECF continue with a U175 championship at the British or change it to U190?
Damage might be an exaggeration, but upheaval will be caused and competitions not changing their rating limits risk losing their long-standing supporters.

RdC

Tom Chivers said...

People who compete at the 4NCL, Hastings or the British are unlikely to be a representative sample for all chess players. They are likely to be highly competitive players who work to improve their play all the time - effectively people who are "running to stand still", so to speak. Anyway. Impressionistic glosses, individual examples - both prove nothing, and the latter are easy to find (eg an old team-mate of mine.)

As for your second point, yes this change implies other changes, basically that boundaries are multiplied by .8 and have 50 added on to them. Why as a result would long-time supporters of tournaments then boycott their favourite/regular events, as you say?

Anonymous said...

That's certainly a collapse

http://grading.bcfservices.org.uk/getref.php?ref=111200E

but compare it to his more active team mate.

http://grading.bcfservices.org.uk/getref.php?ref=121446K

That's my point about the jumping to conclusions criticism of the ECF's investigations. If there had been deflation you would expect to see it in both grades. It's a natural function of the Clarke grading system that it gives an analysis of results over a period. If your results are poor because you don't play much, then so is your grade. Elo based systems tend to have more of a memory - so they take a while to react.

Tom Chivers said...

But it's not true that if there is deflation in the system, deflation will be visible in every single player's grade. That was the point I was making. Individual examples are easy to find for both cynics and supporters, but do not matter one jot.

Anonymous said...

I reckon the real reason is the disappearance of large open tournaments. For some reason.

Richard

David said...

Heh - the Spanton wikipedia page is tremendous. Scunthope Minor 1980, indeed... I can only assume that he hasn't seen it; else I'd humbly suggest that a man who allows such things to stay on the internet ought not to talk too much about the stroking of fragile egos...

As for the grades, I'd suggest that:
- by far their most useful purpose is as predictors of future results
- it would be nice if today's grades had the same value, in some sense, as yesterday's grades. That is, a grade of XXX today should, if possible, be somehow equivalent to a grade of XXX at any other time.

As I understand it, the current system is simply failing on the first point. The symptoms appear to be that a rating difference of however many points is no longer worth as much as it was; that is, higher rated players are not scoring as heavily as would be expected against lower rated players. That being so, it makes sense to multiply grades by some appropriate factor - which apparently turns out to be about 0.8 - to restore their predictive power.

After that, the adding of some constant value (50 points) is just an attempt to meet the second criterion and make new grades look roughly like old grades.

In short, this all sounds perfectly reasonable to me. Except, of course, that if we're going to change everything then it would surely make more sense to take the opportunity to join the rest of the world and use an Elo-style system...

ejh said...

Isn't it specious to imagine that gradings can act as a predictor anyway?

ejh said...

It was Spanton to whom I referred here. A Sun journalist asking for integrity in any field is asking for an ad hominem response.

Anonymous said...

Hands up who feels ejh is making them feel rather inadequate in their command on the English language, using words such as specious and ad hominem? They should bring back Call My Bluff and ejh should go up against Nosher.
I wonder if the (supposed?)grade deflation started when the +10 points on a juniors grade (when you played them) was removed. I think it was removed as it was causing inflation. I suspect the adjustment will line up ECF grades much closer with FIDE ratings.
Personally I feel that there are fewer 200 graded players than in the past, that 180 players are pretty useful (yes you as well next year Tom!) and so on. Not very scientific these hunches though.
Andrew

David said...

"Isn't it specious to imagine that gradings can act as a predictor anyway?"

You think? It seems to me that they're extremely successful at this. (And this isn't surprising: experience surely tells us that in chess past performances are a pretty good indication of future performances.)

I really do think that almost the whole point of a grading system should be to give some meaningful indication of expected results. What do you think grades are for?

Tom Chivers said...

"Isn't it specious to imagine that gradings can act as a predictor anyway?" - imo only for individual results, not for aggregated results. Presumably next season the fit will be a lot closer between expected and actual results. It will be easy to see if this is the case by slicing the data anyhow.

Tom Chivers said...

I believe that amongst their peers of recent years Sun journalists have had the *most* integrity, in the sense that, of all newspaper journalists they've had the least number of complaints against them upheld by the regulatory authorities. Of course this doesn't imply there exist any journalists with any integrity, but if there are then they are *officially* more likely to be found at The Sun than anywhere else.

ejh said...

not for aggregated results.

Well, what I mean is, I suspect that it is specious to imagine that you will score (say) 40% against players graded ten points better than you and 65% against players graded fifteen points worse, because it doesn't really work that way. There may be many reasons why, but one is that it may be psychologically more difficult to play against people a class below than a class above.

Indeed, knowing an opponent's grade will in actual practice affect the way you play, which is problematic for gradings in itself.

(I'm sure there must be statistical information that would tell us whether I'm likely to be right here.)

Tom Chivers said...

Yes, there will be these sort of effects for some players (posing the kind of problem that the double hermeneutic does for the social sciences?) I blogged ages ago how I'd gone two seasons without beating anyone in the 160s, and that my average grade against fellow 160s was through the floor compared to both weaker or stronger players, a particularly clear and extreme example. (Link.) The question I suppose is whether or not this things iron themselves out overall, or not. I don't have a strong hunch either way.

Anonymous said...

Gradings are only a guide and surely the most important thing is to enjoy playing chess.

Personally I am not bothered whether my grade is 20 or 30 points higher or lower (last season I was 165), although I can understand if somebody is graded in the 190's,140's or 90's it might be good to get the extra few points to reach 200,150 or 100.

Alan

Tom Chivers said...

Quite so Alan.

Returning to the subject of Tim Spanton's wikipedia entry, does anyone else suspect that there might not actually be a "Bigot Of The Year" competition, of which Mr Spanton is claimed to a finalist?

David said...

'does anyone else suspect that there might not actually be a "Bigot Of The Year" competition...?'

That did occur to me. But google is one's friend in such matters, and it seems that said competition did indeed exist, until Mind decided that the 'winners' were all too happy about their accomplishment. See here, for example.

It's not something I would be boasting about myself; but then I get the impression that Spanton and I probably wouldn't agree about much...

Jonathan B said...

Well, if one is proud of starting a newspaper column the idea of which is to invite the poor and desperate to write in to beg for money - Wanking for Coins as Charlie Brooker would have it - then I imagine one can be proud of anything ... even winning the Sspantonthorpe minor of 1980.

Tom Chivers said...

Btw, the ECF have now disassociated themselves from their (!) forum. They still link to it from their frontpage, but with the smallprint: "The views expressed on this forum are not necessarily the official view of the English Chess Federation." It has also be renamed as the "English Chess Forum".

Jonathan B said...

And I've disassociated myself with this blog - especially the posts I write.

Tom Chivers said...

Just read the wikipedia entry again, and I'm pretty sure Tim would have been 13yo when he won the Scunthorpe Minor in 1980. That's pretty impressive for most 13-year-old chess players, I reckon. I wish I could remember my tournament results from that age to compare.

Anonymous said...

Thanks to Angus I've now seen my provisional 2008/09, including the "NewGrd" which is supposedly the revised grade.

According to the list my grade for 2008/9 of 178 (which is wrong for reasons irrelevant to this issue) has been adjusted to - 179!! Wow!

Using the formula (grade *0.8 + 50)I'd have expected it to be around 192 plus or minus a bit. In fact, as my opponents last season only averaged 164, I'd expect my adjustment to be larger than most.

Looking at comparable S&BCC bloggers (hope you don't mind guys), I find:

OLD NewGrd Grd*0.8+50
Tom 181 181 195
EJH 178 179 192
Jon 138 147 160

Doesn't really add up does it?

For lower graded players the adjustments are more substantial. Here are a couple of random anonymous examples:

92 107 124
72 98 108
107 123 135

So, although more substantial still we'll below what was announced would be the typical adjustment.

Any idea what's going on, anyone?

Carsten

Tom Chivers said...

None! That doesn't look right indeed. I reckon your best bet would be to enquire at the ECF forum under "Grading Debate" . . .

ejh said...

Sorry, what's the last figure (or indeed the formula) all about?

Anonymous said...

Justin

You'll be sorry you asked but here it is.


Somewhere in the thread on the ECF forum there is a link to a paper by Sean Hewitt where he calculates the errors in the current grades and proposes a conversion of "grade * 0.77 + 47" (or thereabouts, I'm quoting from memory)as a "quick & dirty" solution, to both fix grade compression problem and correct the deflation which has happened over time.

In the end the ECF decided to completely recalculate all grades and Tom was told on the ECF forum that the formula "Grade * 0.8 + 50" is a fairly accurate approximation of what most new grades will be, and that therfore, he could expect a NewGrd of about 198!

That was based on an expected 2008/09 grade of 185, which I presume he'll be once his grade has been corrected but even with 181 he should rise to 195, as I listed originally.

Tom

I assume once the list is out there will be lots of activity on the ECF forum and all will be revealed, I can wait I think.
Angus actually linked to here, so I thought I'd mention what I'd noticed.

Carsten

Tom Chivers said...

Yes thanks for pointing out the discrepancies Carsten. I just wish I could help explain!

ejh said...

So I get to be 192 next time but one?

That can't be right, I'm no bloody good.