Friday, April 27, 2007

What Can We Learn From Statistics?

Some players do it with accountant-like consistency and Sherlock Holmes levels of curiousity - whilst for others it never crosses their minds, and they couldn't care less. I mean, keep a spreadsheet of personal chess results, and analyze them statistically. I fall into the former group, and as a follow up to Jonathan's post I'll share two things that I've learnt from my stats - and how these discoveries have recently changed my play*.

To begin with, I found my results as black and white are strikingly different: an average grade of 152 versus that of 171. When I noticed this discrepancy earlier this year, the gap was even larger than that - but since then I've been playing more cautiously as black, frequently seeking no more than liquidation if it's available, especially against 1.d4 or higher rated opponents. And in the background, I'm also now developing an opening repertoire as black. A relatively uncomplicated story of opening problems and expectations emerged, in other words, and I'm working on it.

But, a bigger and far more complex surprise awaited me when I sliced the data by grading bands. Before I go on to try to interpret what I found, here are the details.

Firstly, I found that my average performance against players graded less than 160, is 162. Since I have a grade of 160, this is pretty much as you might expect. Secondly, I found my average performance against players graded 170 and above is rather more impressive - at 176. These two facts of course now imply a third; that against players rated from 160 to 169, my average performance falls through the floor. And this is indeed the case; it's 136. Colour is not especially responsible either - with black my average is 131 versus 160s, with white it's 146 versus 160s. And to top it off, in these last twenty months I've never actually beaten an opponent in the 160 grading band - whilst against the stronger, 170+ players, I've scored +2 =6 -2.

So, how to explain that I do so much worse against 160s?

I had some ideas but wasn't sure, so also discussed it with a few friends. One suggestion was that us 160s are different: we're talented but lazy, or really 190s but drink too much, or we've a wacky style that bamboozles weaker opponents, but never catapults us over 170, but that is rather good against Tom Chivers . . . Now I rather doubt all this stuff, but it did get me thinking on the nature of being a 160.

In fact, I realised, being a 160 is something I know rather a lot about, since I've been one since I was seventeen. And it's also something I think about now and again, usually along the lines of: why I am wedged at this level? Now that kind of thought is probably quite useful against 170+ players, because it reminds me to be vigilant and respectful. But sat opposite a fellow 160, that kind of thought goes somewhere else instead. I start thinking of the game as a potential symbol of improvement: if I start winning these, maybe next season I'll be over 170. And then I start thinking: of course, someone over 170 should mop up against a 160, and that could be me next season. And finally it all boils down to: time to wipe this one off the board! - and a few mad moves later, I'm blundering material, and have lost to another 160.

So, what can we learn from chess statistics? It seems - to me - rather a lot. I've provided two examples from my own findings - firstly and simply, I am not good with black. Secondly and more elaborately, that in trying to prove to myself I'm better than the average 160, I transform myself into the worst 160 ever. Now of course comes the more difficult part: factoring back into my play these findings. Maybe I've started already - I did after all draw as black against a 162 on Monday. Or maybe - who knows - I'll be in the 150s next season, and cured that way of such pretensions and problems? One way or another, I'll continue to monitor any personal chess changes. Using, of course, statistics.

* The statistics go back twenty months - to when I came back to chess, in other words, after a couple of years away. Before that I kept no record.

Also, here is a quick note for those outside of Britain or unfamiliar with our system of chess grading. The approximate equivalents from our grading system to the Elo ratings are:

136 ECF = 1930 FIDE = 2094 USCF

150 ECF = 2000 FIDE = 2157 USCF
160 ECF = 2050 FIDE = 2202 USCF
170 ECF = 2100 FIDE = 2247 USCF


Note: Here is a basic template for a chess results recorder, which calculates your ECF grade so far. More sophisticated statistics (eg by colour, etc) have to be done 'by hand.'


Jonathan B said...

Interesting post Tom.

I fully take on your general point of how useful statistical analysis of grading performance can be useful. My problem, however, is that I don't play enough games to get a decent sample size to draw too many conclusions.

Of course the fact that I don't obtain all the relevant data for the games I do play doesn't help either.

Out of interest, how many games generated the stats you quote in your post?

Tom Chivers said...

Fifty five games or maybe a few more in the last 20 months. It will be over sixty by the end of the season. I don't include any rapid ones though - although these confirm the trends. For instance at one chessabit, I scored +5 =4 -0. So what? The five wins were all with white, the four draws all with black...

Jonathan B said...

It would be interesting to do a similar analysis for just rapid games and compare results.

Of my 23 games so far this year I'm scoring something like 58% with White (13 games) and 40% with Black (10 games). Above and below average respectively according to chessbase.

My good run (see previous thread) was 9 games long - 7 whites and 2 blacks.

My bad run is currenlty 14 games long - 6 whites and 8 blacks.

I have noticed in the past that I score much better with White than with Black. Perhaps this is partly explaining what's going on.

the bad run includes 3 draws against players 15-25 points higher rated than me. Hardly negative results.

the good run includes 2 wins against players rate 40 odd points below me. Hardly impressive results.

So, if I 'seasonally adjust' my figures taking these factors into account - things are still different just not quite as starkly different as appears from raw WIN - DRAW - LOSS numbers.

Tom Chivers said...

Yes you should use a proper spreadsheet with grade calculations, so you can be sure what these runs realistically mean - I made a similar point in a comment on your post.

Rapid games don't mean much to me and I don't play them often enough to get much from them.

ejh said...

There's been some retrospective tinkering with grades since my record doesn't quite tally with the figures I was given at the time: notably the time in which my "normal" grade exceeded my rapidplay grade by more than fifty points.

ejh said...

As far as the original posting is concerned:

1. play the board, not the grading ;

2. do remember that even if you are "really" a 170, this means you'd expect to score 60% against a 160, not beat them every time. It's amazing how people fail to grasp this. A draw against somebody ten ECF points below you is normally a perfectly acceptable result.

Jonathan B said...


I think you're totally right - and it's one of my particular failings.

It's particularly true when people don't play many games a season (say up to 20). Then just one or two results either way (a drawn position lost, let alone a lost position won etc) can have a huge effect on end of season grade.

Tom Chivers said...

Mm, I think the way they do it Jonathan, is if you don't have 30 games in a season, they 'top up' your grade calculation, by incorporating the appropriate number of games from the season before. So you shouldn't be too affected by outliers.

ejh said...

That's quite an encouraging thought.

Last season I played the first five games at about 130, then the rest in 180+ to finish with 178.

This season I shall play three games only - at the 4NCL next weekend - so presumably they'll have to take the good bit from last season in order to work out my new grade?

Mind you I'll probably score 0/3.

Jonathan B said...


doesn't that make the grade dependant on older games which is not necessarily an accurate reflection of current strength?

Tom Chivers said...

Your last thirty games are more likely to accurately reflect your strength than your last one or two. There may be exceptions of course, for instance if you have a stroke over the summer, etc.