Comments on The Streatham & Brixton Chess Blog: BORP? XXVII: Doesn’t work properly

My friend Mr Hogg, unaccustomley gruntled, made se...

2014-02-28T05:12:31.999+00:00

My friend Mr Hogg, unaccustomley gruntled, made several comments:
1. Elo schmelo! Sydney chess expert and math guru Roger Cook operated a rating system suspiciously similar to, and prior to, the good prof Elo's efforts. In days of yore reported in sumpin' called CHESS, Sutton, Coldfield he opined.
Stigler's Law in operation?
Dare one use the plagiarism word in the absence of Mr Keene?

2. ELO is a heuristic ie rule of thumb, having very feeble if any connection to maths or statistics. Glicko has better credentials: currently used by the Australian Chess Federation I believe.

3. Prof Elo's magnus opus was published in that august peer reviewed journal, The Journal of Gerontology. Stop laughing, I am serious! And of course you will ask why not in a serious math or stats journal? See point 2 above.

4. Those redoubtable Kaggle boys in Melbourne, now relocated to Silicon Valley ran two contests to find a good rating system for chess and offered nice prizes too. ELO methods IIRC finished waaaaaay down the list as a useful method.

5. ELO's are published without any indication as to statistical accuracy: what is a measurable difference? Does a, say, 2700 player have a measurable different performance to a 2710 player?
A 2720 player? If not then how can "ratings" be allocated down to one point?

Funny the conversation should take this turn. Do ...

2014-02-27T21:39:45.973+00:00

Funny the conversation should take this turn. Do pop back tomorrow.

Oh, there are statistical tinkerings you can do wh...

2014-02-27T21:07:40.794+00:00

Oh, there are statistical tinkerings you can do which will fix that: you could, for example, put in a rule that juniors count as unrated for the purposes of adults playing them. (This might or might not be a good rule, mind you, but it's certainly possible to do this.)

Jack Rudd wrote: "[Elo] assumes the players i...

2014-02-26T22:13:25.480+00:00

Jack Rudd wrote: "[Elo] assumes the players in the system are, as a group, not gaining or losing total strength."

This is correct. Or, to express it differently, the problem is that chessplayers WANT the rating system to reflect strength on some cardinal scale, instead of the ordinal scale which it is.

No amount of statistical tinkering (k-factor or whatnot) will ever fix the fact that young players come in with low initial rating, improve rapidly as they gain experience, and suck the rating points from non-improving adult players.

The most practical fix would be to give ratings to a pool of fixed software on fixed hardware, then periodically enter these known checkpoints into rated tournaments for ratings correction purposes.

The rating limit dropped from 2200 to 2000 in the ...

2014-02-22T21:58:27.076+00:00

The rating limit dropped from 2200 to 2000 in the early 1990s. At the same time this harmonised men and women as previously women cut off at 1900, increased to 2000 by adding 100 to every woman except Susan Polgar.

This was before the 4NCL had even started, so when the adult players in the 4NCL were 160 plus, they almost all got ratings and above 2000. The plan to extend ratings to much lower came several years later, probably 1998 or 2000. They extended it downwards with caution, so it's only recently that it has reached the ultimate level of 1000. The difference now as compared to when the cut off was 2000 is that losses to lower rated players are now likely to count. This hits both directly and indirectly as you no longer get the odd potentially easy points (from a 180 perspective) against a career 160 with a 2100 FIDE.

RdC

@AngusF can have a go and see what I can do - do y...

2014-02-22T14:33:55.263+00:00

@AngusF can have a go and see what I can do - do you know off the top of your head what date the lower limit changed?

@Matt Fletcher: I realise your offer was made to J...

2014-02-22T10:31:15.871+00:00

@Matt Fletcher: I realise your offer was made to Jonathan and this might be a bit hard to do but: how about a comparison of players' ratings of some years ago (before FIDE dropped the rating floor to whatever it currently is) to the same for now?... I'm wondering whether there's a large group of players who were previously rated up to, say, 2200, and who now have a much lower rating?

@Jonathan Any particular analysis you've seen...

2014-02-21T20:33:29.925+00:00

@Jonathan

Any particular analysis you've seen / would like to see on whether it's working or not?

The ECF ratings are a joke too. There's one pe...

2014-02-21T19:28:45.988+00:00

The ECF ratings are a joke too. There's one person within a handful of points of myself. If I played him in a match and won 8-2 I would consider that a terrible result. And no I'm not improving.

@Matt, the ECF to FIDE conversion (or a slightly m...

2014-02-21T18:57:38.668+00:00

@Matt,
the ECF to FIDE conversion (or a slightly modified version) seems to work OK still on average - is your contention that there is a specific subgroup of players for which it works significantly less well?

Yes. (Although I’m not 100% convinced that it’s working that well in general either).

Also in corporate-coffee-bollocks news

2014-02-21T18:55:44.071+00:00

Also in corporate-coffee-bollocks news

Jonathan - based on looking at the available data,...

2014-02-21T18:36:13.292+00:00

Jonathan - based on looking at the available data, the ECF to FIDE conversion (or a slightly modified version) seems to work OK still on average - is your contention that there is a specific subgroup of players for which it works significantly less well?

In the UK, England anyway, we do have a particular...

2014-02-21T17:30:58.774+00:00

In the UK, England anyway, we do have a particular problem that a few years ago, the ECF flattered our strength by adding around 25 ECF points to players at the average level

This is one of the things that people say to explain the breakdown in the ‘conversion’ link between ECF grades and Elo ratings - i.e. it’s not that Elo is too low but that ECF is too high. This is something i intend to come back to in a while ... but suffice to say I’m not convinced by this argument

The problem, as I see it, with the FIDE rating sys...

2014-02-21T15:20:02.190+00:00

The problem, as I see it, with the FIDE rating system (national Elo systems may have additional rules that avoid this problem) is that it has an inbuilt zero-sum assumption. That is to say, it assumes the players in the system are, as a group, not gaining or losing total strength.

This wasn't a terrible assumption when the rating floor was 2200 - most players of this strength or higher are changing in strength relatively slowly. As the rating floor has gone down, so has the validity of this assumption.

Presumably by Elo, you really mean the Internation...

2014-02-21T14:09:06.165+00:00

Presumably by Elo, you really mean the International ratings computed by FIDE. For at least some players, these are now badly out of line with the English domestic grades which are computed on different principles. Many countries, Scotland, Ireland and Wales included compute domestic Elo ratings. These use the same principles as Elo's original formulations, but with some additional hacks and modifications. Scotland in particular will deal with the improving junior problem by resetting the rating for anyone playing 200 Elo points better than their previous.

So Elo rating is really just a method and the FIDE International ones just an example. In the UK, England anyway, we do have a particular problem that a few years ago, the ECF flattered our strength by adding around 25 ECF points to players at the average level, moving them from around 115 to around 140. Also the method now used for junior players consistently overstates the grade of the most active ones, although not in a way that boosts grades of adult players.

The main problem is that only a subset of games played by English players and graded by the ECF feature in the International list. That's unlikely to change for any number of reasons.

Well, the only reason I see, why Elo might not wor...

2014-02-21T12:36:25.031+00:00

Well, the only reason I see, why Elo might not work for certain groups of players, is that they don't play enough rated games

Well do pop back over the next week or two Phille. I’ll show you a couple of things that may lead you to change your mind.

Well, the only reason I see, why Elo might not wor...

2014-02-21T12:04:22.998+00:00

Well, the only reason I see, why Elo might not work for certain groups of players, is that they don't play enough rated games and the k-factor is too low to account for relatively quick changes in playing strength.

I think it would be desirable if all national ratings would be replaced by Elo. My personal beef with this situation is that it seems entirely realistic, that I might reach the playing strength of a Fide-Master without getting the title, as my Elo is lagging further and further behind (as compared to the national rating).

Phille

Feel free to substitute for ‘conclusion’ for ‘thes...

2014-02-21T10:53:35.554+00:00

Feel free to substitute for ‘conclusion’ for ‘thesis’ if you prefer anonymous.

"Actually the ELO system works very well at determining the relative strength of players”

Depends on which group of players we are comparing I suppose, but I disagree.

You may be correct in your final statement, however.

You haven't presented a thesis yet, other than...

2014-02-21T10:30:58.033+00:00

You haven't presented a thesis yet, other than "it doesn't work". Actually the ELO system works very well at determining the relative strength of players and estimating win likelihoods. Perhaps you're expecting too much?