Monday, October 26, 2009

Whatever goes up...

Long-term readers may well remember that a couple of years ago, I solved the problem of short draws in chess. And now today, I'm going to put an end to the problem of rating inflation and deflation.

Or should I say, problems. After all, even if there is general agreement that FIDE's Elo system inflates - all these new 2700 players can't be that strong - there is more than one issue. Firstly, even club players with good memories can obtain a theoretically stronger and deeper repertoire than that which, say, Bobby Fischer ever mastered. How then are we meant to compare an "average" GM who plays Rybka-perfect lines for twenty moves with such kings of the past? But at the other end of the game, the abolishing of adjournments means endgames are played worse nowadays than, say, in the 1980s, something not at all factored into ratings. Another question mark: why is it that whenever a super-elite player such as Ivanchuk plays in a tournament a few categories below the top tier, they suddenly produce a 2900 performance? Could it be that the top few players - those never outside the top ten, say - actually have deflated ratings, thanks to playing each other so often? Deflated at least relative to the rest of the list, if not to the past?

And then there's the ECF system, which was found to be subject to deflation due to rapidly-rising juniors. A retrospective adjustment of all grades has since been made, and furthermore the ECF have tried to prevent the problem from occurring again by an adding an increment to each junior grade. 10 extra points for anyone under 10 years old each, 5 for anyone aged between 10 and 17. Crude fudge or satisfactory solution? We do not currently know, but at the very least this system ignores the deflationary effects of both juniors who aren't improving, and adults who are.

And what, then, is my solution? It's simple. All we need to do is find a player whose standard never changes: someone who plays at exactly the same level year on year, game on game, move on move. First, we find out what their rating is one year; second, we fix their grade at that point for eternity. Then finally we just measure other players against this one player, anchoring the entire system around this one solid point, changing all other grades as and when any inflation or deflation becomes apparent. (Indeed, different ratings might be adjusted differently.) Now, who could such a thoroughly consistent player be? Why, the answer is obvious. We need a computer programme that is never updated and always plays on the same hardware, and that's it. Problem solved.

Well, that's not quite it. First of all, the computer programme itself must be moderately strong. Strong enough that it could beat anyone on a good day, not so strong that it never loses. Secondly, it must have an opening repertoire not susceptible to anti-computer lines, the way early Fritzes were regularly mashed in closed King's Indian, for instance. The opening repertoire must also be broad enough that it is virtually unpreparable for, but not updated (because this would improve the computer's strength). Thirdly, the programme must face a large variety of human opposition, from weak players such as myself to strong Grandmasters. This could be organized online, or in special "rating determining" tournaments, or both. After that, the only thing to do is just analyse the results for de/inflation, and adjust rating lists accordingly. A very basic example, in case this part isn't wholly clear: let's say in the first year the computer consistently performs at 2650 against all opposition. We set its grade at 2650, but in the next year it performs equally consistently at 2700. This means the computer's performance has inflated by 50 points, so everyone's rating should be adjusted downwards by 50 points. The computer's, of course, stays the same - at 2650. Simple as.

So there it is, another off-the-board problem solved - a far easier thing to do than to solve them on the board, usually. What issue would you like to see Chivers resolve next?


Robinson said...

An interesting solution, unfortunately it can only help us going forward in time and not backward.

Now, three question: 1) Since players will inevitably publish their games against the Unchanging Computer (UC), UC's repertoire will become known and prepared for (no matter how vast). Those who test themselves against UC will prepare and their rating performance should naturally improve from the preparation. Is this improvement and resulting inflation a natural or unnatural reflection of true playing strength? Sure, their playing strength has improved because they've learned to play better in certain positions. But is the improvement weighted properly -- their preparation will affect their results against UC more heavily than against the general population.

2) How do you keep the testing pool from being skewed in any way? Do you force all who want to maintain a rating to test against UC each year? Or do you just choose a large number of players over a spectrum of rating levels who seem to hold a consistent rating level over a great period of time and eliminate those whose ratings are erratic or in accent or decline and eliminate those in age groups that are likely to ascend or decline?

3) Couldn't UC show that there is a different amount of inflation at different grades (as you've suggested there might be already with closed tourneys affecting the Super GMs)? Perhaps some year in the future, the 2650s perform at the expected 50 percent against 2650-rated UC, but the 2000s perform at a (much) better (or worse) than expected/mathematically-predicted rate. Must we be prepared to come up with adjustments at different grades?

Tom Chivers said...

Hi Robinson!

1. Preparing against UC should be discouraged by the structure of the events. E.g., no prize money, only appearance; anyone who obviously prepares an anti-computer line not invited back. And also just a large enough number of competitors so that most preparation is factored out.

2. I would go with a large number of players, say an elite group in a public tournament; regular games against weaker players on the internet all the time.

3. Yes. I absolutely think this should happen, that adjustments should be different at different grades if that's what the results suggests. My guess is we would see Leko, Topalov, Kramnik, Anand & Carlsen all over 2800, but *less* players over 2700.

You are correct that this says nothing about the past. Perhaps we would need to divide chess ratings into two eras: BUC & AUC (Before Universal Computer, After Universal Computer)!

Sverre Johnsen said...

I disagree that the elimination of adjourned games have lowered the standard of endgame play. I know a few players who claim that they gained a lot of endgame knowledge from the study of their own and friends' adjourned games. But - surprise - these very players actually play their endgames extremely poorly!

I knew a Norwegian player who was rated 2150 in his late fifties (and probably around 2300 in his peak) who had only sporadic knowledge about simple pawn + rook vs. rook endgames and I think he was not atypical.

The explanation? His generation never bothered to learn the basic endgames as they knew that they could always look up what they needed. Rational but not very good for general chess education.

ejh said...

Isn't the idea though that really top class endgame play is probably more rare then it was, as opposed to there being less (or for that matter more) good technique?

Tom Chivers said...

Yup - I was thinking of top class endgame play. The point you make Sverre does indeed imply that relative to the rest of their game, weaker players probably have better endgame skills than they did, say, twenty years ago.

Mark Weeks said...

It's an interesting idea, but there's another drawback similar to Robinson's point about the opening repertoire. The more you play against the same program, the more you become aware of its weaknesses, particularly in respect to positional evaluations. This knowledge can be used to steer into positions which the computer has likely misevaluated. Since the software will eventually become widely available, anyone could practice against it, discover the weaknesses, and play accordingly. - Mark

Tom Chivers said...

Hi Mark.

Again, I think events can be structured against this kind of thing, e.g. only inviting the same players once or occasionally, appearance fees only (i.e. no prizes for winning), keeping UC's exact set-up secret, inviting enough players that the occasional anti-computer line doesn't make so much difference. Btw, I don't think the software will become widely available, since to be beatable it would have to be based on slightly out of date programming. Fritz 7 maybe!