Friday, February 28, 2014

Elo DWP?

I played at Hampstead a couple of weeks ago. Although my score of +2 =0 -2 might not sound that bad, in truth I played extremely badly and quite deservedly failed to match better outcomes I’d achieved at similar events in January, November and October. Frankly, it was not a successful weekend for me at all.

It wasn’t as bad as FIDE would have you believe, mind.

There’s a standard narrative that you get when you raise the question of whether there’s systemic deflation in the elo rating system. It’s the kids. They improve too fast and the system can’t keep up. Especially now elo ratings go down to zero.

You play a junior, especially one of the very young ones, and since their rating doesn’t reflect their true playing strength you can lose a tonne of elo points even if you lose to somebody who, objectively speaking, is about your strength. And even if you do happen to win you don’t get your just rewards.

So underrated juniors leads to underrated adults. Listen carefully and you can hear the low hiss of rating points leaking out of the system as the spiral continues ever downwards.

That, anyhoo, is the argument. And it’s not an unattractive thesis either, given that it’s not in the least bit difficult to think of an examples of absurd grade-rating differentials - the kid who has an ECF in the 180s whilst retaining a FIDE in the 1880s, say.

It’s all down to the youth of today, then? It’s a reasonable argument, for sure, and yet I’m not convinced that’s all that’s going on.

I won two games at Hampstead and lost two. That’s according to the tournament. According to the rating system I only won one. My first round opponent didn’t have a rating so the game didn't trouble the raters.

Them’s the breaks? Perhaps, but the same thing happened in January. And November. And October. And two of my games at the Hampstead Open back in August didn’t count either. Which, as it happens, is the same number of games that didn’t count for rating from the nine I actually played in Penarth.

So in the past six months or so, I’ve played 31 games in rated tournaments of which eight didn’t count for the ratings calculations. And my score from those games? +7 =1 -0.

Little wonder that my elo has been in freefall while my ECF has been on the up?

Maybe this would help

Is it possible that I’ve simply had an unlucky run over the past couple of months? Perhaps, but I don’t think so. In fact my experiences at the Hampstead tournaments matches that of the preceding couple of years. Since my first event in Sunningdale (May 2011) I’ve played a total of 92 games, but only 72 have counted for rating. My score from the missing games has now reached +14 =5 -1.

Secondly, there’s the fact that many of these games are against very inexperienced players who wouldn’t earn me many points even if the victories were counted. That’s true enough, but not invariably so. Amongst the missing I find games against players with long-established ECF grades in the 140s, 150s, 160s and even 170s.  And these missing points add up over time.

Finally, it could just be me, couldn’t it? It could, by some statistical quirk, just be an outlier. That it’s happening to me doesn’t mean that it’s happening to everybody else. Missing games might be knackering my rating, in other words, but aren’t necessarily having an impact on the system as a whole. Well maybe not, and I’m open to somebody demonstrating that I’m wrong, but as things stand I remain suspicious.

Yes I know. Ultimately, we play because we love the game and, when it comes down to it, ratings don’t really matter that much.

But, if we’re going to have a rating system we might as well have one that kind of sort of reflects the playing strength of the chessers involved.  So this latest missing game at Hampstead might not have cost me much in itself, but that’s not really the point.

How, I wonder, can the system work effectively if games routinely go missing. When this happens at every tournament, the answer, as far as I can see, is that it can’t.

Sunningdale Major, May 2011
4 of 7 games rated
+3 =0 -0

Gatwick Open, June 2011
4 of 5 games rated
+0 =1 -0

Benasque, July 2011
9 of 9 games rated

Twyford Challengers A August 2011
6 of 6 games rated

Sunningdale Open, September 2011
3 of 5 games rated
+0 =1 -1

Imperial College Open, November 2011
3 of 4 games rated
+0 =1 -0

London Chess Classic Open, December 2011
6 of 8 games rated
+2 =0 -0

4NCL 2012
2 of 2 games rated

Penarth, July 2012
6 of 9 games rated
+2 =1 -0

Twyford, August 2012
6 of 6 games rated

Penarth, July 2013
7 of 9 games rated
+2 =0 -0

Hampstead Open, August 2013
4 of 6 games rated
+1 =1 -0

Hampstead u2200, October 2013
3 of 4 games rated
+1 =0 -0

Hampstead u2200, November 2013
3 of 4 games rated
+1 =0 -0

Golders Green u2200, January 2014
3 of 4 games rated
+1 =0 -0

Hampstead u2200, February 2014
3 of 4 games rated
+1 =0 -0

Games in elo rated tournaments: 92
Number of rated games: 72
Missing games: 20 (21.7%)

Score from missing games: +14 =5 -1 (82.5%)


Anonymous said...

The question is ask, which those with access to the detailed results could answer is " Is there any bias in results submitted for both ECF grading and FIDE rating where a FIDE rated players plays one who isn't? " The answer might be different for Juniors and adults.

I could suspect the disparity for the blog author is a fluke, but it might be that rated did better than 50% against non-rated to an extent not explained by their respective ECF grades. Possible explanations abound, but rated players are likely to have more experience of playing with 30 second increments.

With International titles being rating dependent, a set of robust rules is necessary. Whilst you could treat national ratings as an initial rating, there would always be the question as to whether these were universally trustworthy.

Extending the International scale all the way down to 1000 (not zero) has created a whole series of new problems, which were not present when only the elite or near elite could get International ratings.


John Cox said...

Games not getting rated is not a problem with the rating system itself, of course. And you might equally well have lost all those games. if you want to show the ELO system doesn't work, complaining that not all national games and rated under it isn't going to do that. No system will work well if a player's best results aren't fed into it.

Jonathan B said...

And you might equally well have lost all those games.

well I might, but I didn’t. And I suspect the same is true for most people in my situation.

It might be different for different situations - e.g. for those who play mainly in the 4NCL or perhaps tournaments in other parts of the country - but in London I’d be really surprised if results of rated v unrated were anything like 50:50.

No system will work well if a player's best results aren't fed into it.


Jonathan B said...

Extending the International scale all the way down to 1000 (not zero)

Thanks for the correction Roger.

John Cox said...

>but in London I’d be really surprised if results of rated v unrated were anything like 50:50.

Well, of course they're not. Rated players tend to be more experienced and thus tend to be stronger.

But if those unrated players were somehow miraculously granted a rating corresponding to their strength, you might have won point or you might have lost them.

I'm not understanding your point, to be honest. You seem to be suggesting that you're somehow suffering by some of your games not being rated and that for some systemic reason you would otherwise have gained points in those games. That can't be true, but otherwise I don't see what it I you're trying to say.

Jack Rudd said...

I don't think anybody expects the scores in rated v unrated games to be 50-50. What might be more interesting to track is whether the rated players tend to do better in those games than the grading differential would suggest.

Anonymous said...

The key problem which you have identified is that in Swiss tournaments (unlike All-Play-Alls)games between rated and unrated players don't count for the former. Two or three years back FIDE proposed to address this, but the plan was dropped. Clearly this is a suggestion which needs to be revisited.

David Sedgwick

Anonymous said...

Most regular players in Opens and the 4NCL have had International ratings for many years. But below that level, perhaps particularly amongst London League players or SCCU county players, I suspect distribution of FIDE ratings is patchy. If you took a pool of adult players in the 120 to 170 range, entered in Majors and Intermediates, I wouldn't be sure that the average ECF grade of those with a FIDE rating would exceed that of those without. Hence the centralising assumption of a 50% score. Above that, there are relatively few players who don't have FIDE ratings, the British Championship usually has no more than two,and in some years none at all.


John Cox said...

I'm still not understanding why, David. Indeed, I don't see, conceptually, how you could possibly rate a game between a rated and an unrated player (at least not until the unrated player has played a few more games).

Anonymous said...

Garry Kasparov has been given Croatian citizenship!

DJY said...

Hi Jonathan,

I'm interested in this as someone who got a rating 3 years ago, only to gain 30 ECF and lose 15 Elo since then. However, looking through my FIDE games I don't have your problem - 49/56 have been against rated opponents. Of the other seven I scored +2 =5 -0 at a 164 performance: a little above my grade but not much. Most of my FIDE-rated games are in South London, or the odd e2e4/4NCL, so I'm not overly surprised most people are rated.

Instead I just seem to consistently underperform by 15-20 points in weekend events, where most of the games are. Not sure why - me trying too hard to correct the rating, congress opposition being more dedicated, or them making an extra effort when playing a 160 with a 1750 rating. Who knows? Two people isn't much of a sample, but one of us must have had some unusual luck...

Daniel Young

Jonathan B said...

I don't see what it I you're trying to say

Well, I wanted to write about the Elo system. Specifically, that - if only for me - it’s busted. I really don’t see me ever getting back to 2050. More likely I’ll get over 200 ECF.

Perhaps I didn’t do it very well, so I’ll try to give a bullet point summary of what I’m trying to get at here.

1. My opinion is that the Elo system is irretrievably broken.

2. People usually say that this is because of under-rated juniors, but my feeling is that games against unrated players is also a deflationary factor.

3. I wanted to show that I’d missed a lot of games being rated and my score from these games was enormous.
(You might think this is obviously going to be the case - and I would agree - but I have had somebody argue this point.)

4. Yes it’s true that I only lost a point or two from my game in the first round and that’s hardly a massive difference. What I’m raising is the issue of this happening every tournament. And not just to me - to every (or at least many) rated adult.

So in effect we’re ALL losing about 20 points every six months, just by playing.

5. I wanted to write this all out partly because I wanted to see if anybody would say, ‘well I don’t think so because ...’ or ‘well I’ve played 2000 games since Christmas and every one has counted’. To see if other people’s experience matched mine in other words.

I do see that rating a game against an unrated opponent is problematic, but simply ignoring them is causing problems. At least in areas where there are a lot of unrated opponents to play.

It’s all very well for Roger to say, " there are relatively few players who don't have FIDE ratings”, but that’s just not true where I play.

Jonathan B said...


one of my reasons for writing this series was to test the theory that things are different depending on where you play.

So one possibility is that we’ve both had average luck - and what’s happening to me is normal where I play and what’s happened to you is normal where you play.

I’m guessing we don’t play many of the same events otherwise I’d recognise your name from the pairing lists/scoring tables - and I don’t think I do.

Anonymous said...

If you look at the subset of Fide-rated players in English chess and their games among themselves, it is obvious that they cannot all be underperforming.
Exactly because there are no rating points that "leave" this subgroup, on average people should perform at their Fide-rating.
Of course there can be other issues, like different k-factors for different players and fast improving juniors and whatever.
But as I see it, games against unrated players not being rated, cannot have any influence in in/deflation.


Jonathan B said...

it is obvious that they cannot all be under performing

'Obvious', perhaps, but demonstrably an incorrect assumption. Come back next week.

As for unrated games not counting, isn’t it ‘obvious’ that a system that counts only a player's losses but only a fraction of their wins (or none at all as the case with me for many many months) will end up under-rating them. When this happens on a systematic basis it leads to deflation.

DJY said...

It's true that I've never played at Penarth or Hampstead, where over half of your missing games come from. My local events are at CCF which is almost all rated players (only 2 unrated games from 25) - take those out and I have 16.1% unrated.

As to why it could vary so much by pool, I'm not sure - maybe because the CCF tournaments are mostly the same people every time, so mostly got a rating long ago? Or maybe it's just in the right place geographically.

Alternatively, other events may be more successful in attracting people who don't play many congresses; in general are your unrated opponents inexperienced, mostly league-only, or something else?

Daniel Y

Anonymous said...

Those losses of yours that were counted by the system, were somebody else's wins. The Elo you lost, was won by your opponents.
If the unrated players are ignored by the rating system, then their games don't influence the rating system.
If you look at games between rated players there will still be an even number of wins and losses.
That's just mathematics. Maybe not obvious, but if there isn't some important fact that you left out, how the unrated players do affect the rating system, then you definitely have to look somewhere else for a reason for deflation.


Laurent S said...

Isn't the lowering of the FIDE rating floor to 1000 the explanation for general deflation (ie. lowering of average FIDE rating, which impacts stronger player par ricochet) ?

Anonymous said...

If you reduce the average FIDE rating by bringing in more weak players, that shouldn't affect the top players, because they don't play them anyway. Indirect effects would arise only if the low ratings were incorrect. That is the allegation of course. In the days when the ratings were cut off at 2000, a player of a 2100 standard couldn't play a rated game against a player of 1800 standard. Once you extend that range, rated games become possible, but the 2100 player maintains their rating provided they make the necessary expected score.