Sunday, January 31, 2016

What a Crockett IV


Considering the question of the 26-losses-and-a-draw sequence suffered by Stephen Crockett, whose curious playing record we have been looking at this past week, it occurred to me that we have a precedent in a story many chess players learn soon after they learn the moves. It's described on Wikipedia as the "wheat and chessboard" problem. I'm sure most readers know how it goes.

Sometimes it ends well, sometimes it doesn't.


Now in our current example, nobody is calling for anybody's execution, while controller of the Grand Prix isn't quite "a high-ranking advisor", but let us perform our own version of the problem, just to get ourselves a starting-point figure to work with.

It is, as I say, a starting-point figure, no more than that, so to obtain it, I've taken it that we wish to calculate the probability of a random player1 obtaining such a score in a twenty-seven game sequence2 against evenly-matched opponents.3 I've neglected the question of colours and I've chosen to estimate the probability of each result as follows:

Win 40% Draw 20% Loss 40%.

Obviously the reader is welcome to employ different values and thereby obtain a different result.

A different result, that is, from 1.37 billion to one.

What a figure. How do we get there? Well, our random player, who has to score no more than 0.5 out of 27 (but 0.0 will do) has 27 chances to get his or her draw, which draw has a probability of 0.6. (Or 3/5 if you prefer.)

That's 27 x 0.6, which is 16.2.

In the other 26 games, they lose, each instance of which which has a probability of 0.4, or 2/5. Over 26 games that's 0.4 to the power4 of 26, which is where we start remembering the grains of wheat on the chessboard, since the number that emerges is so high.

Because it's easier on your calculator you may like to consider this as constituting 1 over 2.5^26, and our overall calculation as 16.2 ÷ 1/2.5^26, which is 16.2 ÷ 22204460492.5. No really.

Now just as to get the fraction of (say) 5/20 down to betting odds, we divide 20 by 5 and then take off 1 to get us 3-1, so with our calculation we divide 22204460492.5 by 16.2, which is 1370645709.4, take off 1, which is 1370645708.4, lose the 0.4 because frankly who cares, and that gets us

1370645708-1

which is quite a number.5 A number higher than 1.37 billion. That number, to one.

Now let me reiterate, these aren't the odds against Mr Crockett performing the feat, they're the odds, given the values suggested,6 against a random player performing the feat (or worse) against evenly-matched opponents.

With any given player, Mr Crockett not excepted, it doesn't work precisely that way, for all sorts of reasons, not least because chess games aren't entirely independent events. The result of one can be affected by the result of a previous one, and - as has been suggested by Mr Crockett's defenders - perhaps some people are more likely to be affected than others by a bad start to a tournament and can go on to lose a rack of other games.

No doubt. No doubt. But, when we say "more likely" - how much "more likely" do we mean?

Likely enough to bring down a figure of 1.37 billion to one to something we can actually believe?

How likely is that?

Two points here: two inescapable points.

The first is that there is no "normal" way this sequence can have taken place. Either there is something exceptional about the player, or there is something exceptional about the way he has conducted the games.

The second is that the odds against the sequence are simply too great - far too great, massively too great - for us to take the first of those two explanations for granted.

If you want your competitions to have any integrity, and if you want your grading system to have any integrity, and you have a set of results that - things being equal - is so unlikely that we can measure the odds at a billion to one against, then you simply cannot assume that things were equal. It's absurd.

So if you're looking for explanations, then they have to be good ones. They can't just be handwaving. They have to have enough evidence behind them to make up the gap between more than a billion to one, on one hand, and something we think is credible, on the other.

Because that's a hell of a big gap, even by the standards of credibility gaps.

So do we have an argument for the credibility of this sequence that rests on more than assertion? Anything that rests, reliably, on the record of other players, in similar circumstances? Because if not, our evidence for Mr Crockett's own explanation for Mr Crockett's record is simply Mr Crockett's word.

Which is OK, if it's OK with you. But on the other side of the argument, there's a straightforward explanation, one that fits all the facts known to us, and one that would fit a pattern of results whereby a player who loses the first game in a tournament goes on to lose most or all of the rest.

And on that side of the argument there's some really big numbers.

Big numbers, like more than a billion to one.

- - -

1 Of course another way to work would be to calculate the odds of each result as predicted by the ECF grades then obtaining. I've chosen so far not to do this, though anybody else is welcome to. This is partly because I particularly wanted to look at the odds in general, for illustrative purposes, partly because I don't entirely trust the integrity of the grading list. For what it is worth, if I have read the sequence right then Mr Crockett was graded on average of slightly more than 1.5 points higher than his opponents with listed grades: however, at least one of the two players without a listed grade may well have been a stronger player, perhaps substantially so.

2 For the actual sequence scroll to collapsed.

3 Of course where the opponents vary in strength, the odds against such a disastrous sequence would actually increase, provided we assume (as I think we can) as much variance towards weaker opponents as stronger. (To illustrate this, examine the equation 1/2 x 1/2 x 1/2 = 1/8, which is 7-1 against, whereas, say, 1/4 x 3/4 x 1/2 gives 3/32, or more than 10-1.)

4 ^ apparently means "to the power of". No, we didn't use it when I was at school, we used them superscript things, as we did not call them then.

5 If you didn't follow that, I don't blame you. Maybe try doing the calculation yourself but as if it was only, say, four games rather than twenty-seven, so you see how it works.

6 I tried the calculation again, inputting the skewed values of win 25%, draw 25%, loss 50%. It brought the odds down loads. To, ah, a mere 3.3 million to one.

[Thanks to Bat for his help with the maths.]

[Entirely anonymous comments will not be accepted on this series of articles.]

33 comments:

Jack Rudd said...

The odds improve slightly if you replace the event with "a sequence of at least 27 games in which he scores no more than ½, at some point during his season of however many games it was". But not, I think, to the extent of raising the probability above 1 in a million.

Anonymous said...

I wouldn't really trust your 40% model. It occurs to me that if you are feeling ill and having a bad run, that you might play all the nonsense in your repertoire and for that matter play at a superficial level. Thus your change of losing may be well in excess of 50%.

So a rather more nuanced question. Is it acceptable for a player to continue playing when unfit and later enter and win tournaments that he wouldn't otherwise be eligible for? If it was team chess and you had a large enough squad, you would consider dropping them from the team.

I was trying to think of sports where playing when unfit was potentially advantageous. Anything which offers prestigious prizes to participants of a low standard really.

No one has tried the "help from third parties" hypothesis. That says that zero from plenty is really the natural standard of play, with the decent results being when some form of assistance is available. So you compute the probability that someone with an extremely low grade can win even Minor tournaments. If you apply a similar 40% model, you would get long odds on that.

RdC

Anonymous said...

According to this logic, the probability of seeing an unbeaten streak such as Tal's 95 games is (0.6 ^ 95), which is about 1 in 10^21. That's unimaginably unlikely - a trillion times more unlikely than the number you're putting on Crockett's run.

Of course all that this tells us is: this is a really terrible method of calculating the odds.

Thanks, though, for showing some working.

David

ejh said...

According to this logic, the probability of seeing an unbeaten streak such as Tal's 95 games is (0.6 ^ 95)

As you're well aware, David, no it isn't. That's the probability of a random player doing so against evenly-matched opponents, not of a world-class player doing so against largely inferior opposition, phenomenonal though Tal's run was.

It occurs to me that if you are feeling ill and having a bad run, that you might play all the nonsense in your repertoire and for that matter play at a superficial level

Well you might and you might, but this is what I mean by having nothing to support the thesis but Mr Crocket's word. Is that what is supposed to have happened? And is it remotely as likely aa thesis as the more obvious alternative?

ejh said...

(The general point here is that while I'm quite prepared to believe in outliers - and somebody has to be an outlier - I'm a good deal less prepared to believe in outliers for the case of losing large numbers of consecutive games who also happen to be outliers for the case of winning large numbers of grade-limted tournaments.)

Anonymous said...

> is it remotely as likely aa thesis as the more obvious alternative?

Now that is a good question. How about taking the generous approach: try believing Crockett's explanation, and seeing whether it can be made plausible.

Something like: assume that he does indeed suffer from bouts of depression that leave him unable to concentrate; and that on such days his not-losing chances slip from 60% down to - what - 50%? 30% 10%? Decide for yourself what you find plausible, and plug that in.

Maybe too allow for some sort of head-goes-down effect, making one loss more likely to follow another. Again, decide what seems plausible to you, and allow for it.

How unlikely does the run look now?

If we can explain away 21 orders of magnitude by moving from a terrible model to a good one (Tal) then a mere 9 for Crockett may not be impossible.

David

Anonymous said...

The ratio of the expected result to the actual result per tournament would be nice.
It would give a fairly simple illustration of the P(sand)
--theblueweasel who cannot be bothered to open a spreadsheet

ejh said...

Because we're being advised to "assume" (on the basis of what?) and say "maybe" (on the basis of what?) and to "decide for yourself" (on the basis of what?). This is precisely the "handwaving" to which I referred in the original piece. Or, as you put it, "decide what seems plausible to you", which is a synonym for "make up whatever suits you".

I'll stick with William of Occam, thanks.

ejh said...

Again, the crucial thing to get across is that there's not a proper comparison with people at the top end of the bell curve, people who we know are phenomenal player and produce phenomenal results. We're talking about somebody who is at both ends of the bell curve, and we need comparisons with that phenomenon. So far I'm not seeing any: they're being postulated, and they may for all I know exist, but I am not presently seeing their existence.

Anonymous said...

Not sure whether to post here or on an earlier thread regarding the curious lack of middling scores. Whilst one would almost expect outliers in anyone's performance over time, the lack of inliers (to coin a phrase) was troubling. One would expect a high seed in a grading restricted tournament to race to 2/2 or 3/3 before meeting a player of similar (or greater, in the case of a rapid improver passing through) ability at which point draws or losses become more likely (especially if the player is prone to slumping).

In essence the given explanation is, that rather than being a player of 120 ability, the player is one of 150 ability (or whatever) who frequently performs at 90 or below.

This is in itself unusual. It also seems fortunate as far as I understand it that the episodes of good play and bad play are such that the grade at the key point was (usually) at a level that allowed Grand Prix success. Of course, this could be explained by a seasonal element to performance.

AWIC

Jack Rudd said...

Shall we contrast with players who are known to be inconsistent and to have problems with depression? Someone like me, say. I get my share of TPRs well above my rating, and my share of TPRs well below my rating, but I also get a fair number of TPRs similar to my rating. (At my last tournament, Hastings 2015-16, I scored 5/9 compared with an expected score of 4.99.)

ejh said...

It's the missing middle that's the mystery. I mean we could also, for instance, try and calculate the probability of managing only one score within half a point of fifty per cent in the twenty-one standardplay tournaments of 2014/5.

Anonymous said...

There's a Facebook group "Chess Heaven" which has been discussing these variable results for a considerable period of time. I believe that notwithstanding the run of previous poor form, they predicted the victory at Scarborough.

As regards probabilities, it's only reasonable to assert that the probability of such a sequence of losses is telephone number to one if
(a) you were supplied with a list of forthcoming games between players of equal standard
(b) you had a large number of them to choose from
(c) you stick a pin in the list
and
(d) you always chose the loser.

RdC

ejh said...

Yeah, I saw Chess Heaven. It's not very interesting. I'm not grasping your argument in the second part at all - can you have another go?

Anonymous said...

I'm not grasping your argument in the second part at all - can you have another go?

There are conditions attached to being able to multiply small odds to make large ones. It's as well to bear them in mind, a supposed "expert" witness in a trial sent a woman to jail using such a fallacious argument.

Suppose you have a one in five chance of losing in a game between players of equal standard and the player loses. Do you still think it's 1 in 5 for the next game of the same player and the result of the previous game has no influence on the next one? It underpins Elo and grading theory that you take into account the previous result to judge the probabilities for the next one.

You could just take the word of those who studied mathematics to university level that a simple multiplicative model isn't really appropriate.

RdC

ejh said...

Do you still think it's 1 in 5 for the next game of the same player and the result of the previous game has no influence on the next one?

No I don't, which the piece actually says.

That's why the piece invites the reader to take the calculation as a starting-point. That's what it says.

But it also invites the reader to consider how different the odds would have to be - how large the influence of the previous game would have to be on the next one, - to bring the likelihood of the sequence down to something credible, bearing in mind how enormous the odds are against the sequence if that influence is negligible.

And nobody's given me anything in particular to suggest that the influence is other than negligible. It might not be: but show me that.

Bear in mind, too, that you might expect games to get easier after losing, not harder - that's what Swiss systems tend to do - would usually make a loss, following a loss, less likely, not more, am I not right?

(And bear in mind also that if you're on 0/2 and feeling down about it, then, well, your opponent may be as well. It's not you, with your head down, against your opponent who's all bright-eyed and bushy-tailed.)


So is that factors more powerful, or less powerful, than any effects which would seem to make a string of losses more likely? I don't know. I've not got any evidence either way. Your opinion is as good and likely better than mine.

But what I do have evidence for is that we're looking at an essentially impossible sequence, unless somebody can show me some very powerful factors that militate against it.

But they need to show them. Not just handwave them into existence.

Mike G said...

I have to disagree with Roger here: grading theory gives you the probability of what will happen in your next game and assumes the result is based on the players' ratings and is independent of the result of the previous game.

Of course grading theory is just a theory and it's an imperfect model of the real world, e.g. you get cases where A usually beats B who usually beats C who usually beats A and the lighthouse keeper scenario, but most players play a range of players and grading theory does OK on the average results (usually).

Anonymous said...

the result is based on the players' ratings and is independent of the result of the previous game.

The theory is that the result of the previous game changes the rating. The rating outcome of the next game is then not independent of the previous game. I agreed that OTB systems batch a month or longer at a time for practical reasons, but on-line systems are continuous.

Do we not have a strange situation where there's a Jekyll of a 150 standard or more and a Hyde of 80 standard or less? If the Hyde turns up, it's no great surprise that a poor score results. The curious feature is as to what determines which one is present and why it should correlate both with the length of a tournament and with the prestige of the tournament. So it's not cheating in the game fixing sense that games are deliberately thrown, rather it's an attitude and style of play making losses more likely.

Many players are concerned to protect and enhance their grades, so if possible will take a break if unwell and playing badly. To what extent should the exceptions be tolerated and allowed to dilute their good performances?

RdC

Anonymous said...

Just to address slumps: it is common in sport for people previously relatively proficient at a skill to go through a slump. Strikers can't score goals (does anyone remember Gary Birtles?), batsmen suddenly can't pass 50, golfers miss putts and darts players can't check-out.

Often a change of behaviour makes things worse - players consciously or sub-consciously don't get into the right positions or start passing instead, risky shots are attempted or putts are hit more tentatively, and the slump continues.

So it goes with chess - what may start out as a promotion to a team in a higher league where you get outplayed more often, or as a change in professional or personal circumstances which leaves you more tired or more stressed becomes a bad run. Possibly you change your openings, possibly you lose heart a little and stop trying as hard or preparing so much. Possibly you dwell on your mistakes and allow one blunder to cost you two games. Maybe you start to bale out with draws. Negative thoughts abound and you wonder if you will ever win again. (As an aside, is there a chessic equivalent of the yips?)

So slumps happen, and can be self-perpetuating, but again, the lack of draws in the sequence highlighted is suggestive. Players on a losing run may be more tempted to halve out. Few people want to start with queenside castling, or, perish the thought, get the five Olympic rings. One might have thought that there were plenty circumstances when a quick half would be mutually benefit and allow competitors to get off the mark, get away early or even enjoy a quick half?

AWIC

Anonymous said...

Just another point for now, before I wrestle with the probability.

In general we should consider other explanations for the data. Losing horribly rather than finishing mid-table may somehow be more acceptable - it may generate more interest or more sympathy. Bad results may be a "cry for help" in some way.

And of course, in any sphere it is possible to blur the lines between conscious and unconscious behaviour or between competing motivations. Can you be sure that you voted for A because they were the best candidate and not because they would give you a tax-cut? You may want to convince yourself its the former, but is it?

AWIC

ejh said...

Do we not have a strange situation where there's a Jekyll of a 150 standard or more and a Hyde of 80 standard or less?

We most certainly do.

In general we should consider other explanations for the data

We absolutely should and no amount of scepticism on my part at any of those explanations should be construed as suggesting anything else. Everyone is entitled to their own defence: the purpose of these pieces is to make a case, or really to make the case that a case has been made, if you follow. This blog does not constitute a tribunal and doesn't seek to do so.


I am very pleased to have had my attention drawn to this.

Anonymous said...


(As an aside, is there a chessic equivalent of the yips?)

The late Peter Clarke and perhaps other solid and correct players can fall into this. It's where then find serious difficulty in winning against any other than the most daring opposition. The point being that they see their opponent's threats and neutralise them, rather than create any of their own.

RdC

Anonymous said...

Regarding the comment on the English Chess Federation's forum, be careful what you wish for. Taking appropriate action might mean sending your blog a legal letter asking for you to desist in speculation about an individual's results at chess.

RdC

ejh said...

I very much doubt that that would be appropriate.

OpenID said...

His opponent's ratings:
139, 133, 128, 133, 127, 123,
148, 145, 146, ???, 137, 138,
144, 147, 115, 119, 119,
141, 132, 125, 131, 127,
113, 158. 109. 121, ???

If his rating really is 120, he doesn't really have a 40% chance per game, as he was (mostly) outrated by opponents during the streak. So I dispute your evenly-matched opponents supposition.

Anonymous said...

I would be interested to know the standard play results during this streak of losses and one draw at quick play. I would think that of they were also very poor, this would tend to support Cs case, but if they were good it would make the sequence here even more unlikely by normal means. CSD

ejh said...

If his rating really is 120

His rapidplay grading was 137 for the first tournament in the sequence and 133 for the others.

I would be interested to know the standard play results during this streak of losses and one draw at quick play

They're perfectly normal: do check the database (if you're not clear how to do this, follow the advice in the first piece in the series) to satisfy yourself on this point.

Curiously enough though, at the start of season 2011-2 his rapidply results hugely improved: but his standardplay results began a long streak of not-winnning.

(Comments today may take some time to be moderated; apologies in advance.)

Anonymous said...

The grading history shows the outcome of hist first season at almost 140 for both standard and rapidplay. This for an adult player, apparently new to Over the Board chess is impressive. A conventional path might be to consolidate and then improve his grade, so that five or six years later he would be playing in Opens and looking forward to playing and occasionally beating titled players, playing in the Hastings Masters, the British Championship, that sort of thing.

We can but speculate as to the reasons, but what happened instead was that his grades plummeted, as will happen if you play a tournament and don't score any points. Bizarrely it affected his rapidplay performances first, but at the time only rapidplay grades were six monthly.

Here's the table in chronological order

January 2010 Rapid 138D (Standard grades were yearly until Jan 2012)
July 2010 Standard 139A Rapid 137A
January 2011 Rapid 133X
July 2011 Standard 139A Rapid 119A
January 2012 Standard 114X Rapid 122X
July 2012 Standard 132X Rapid 97A
January 2013 Standard 117X Rapid 131X
July 2013 Standard 116X Rapid 111X
January 2014 Standard 107X Rapid 132X
July 2014 Standard 124X Rapid 96A
January 2015 Standard 120A Rapid 115A
July 2015 Standard 128A Rapid 112A
January 2016 Standard 128A Rapid 117A

Explaining the suffix, X (now abolished) is based on 6 months and A on 12 months. However the 2012 and 2014 Rapid grades would have been based on the most recent thirty games.


RdC

John Cox said...

Always worth remembering, of course, that the probability of the sequence of results you or I obtained in our last 27 rapid play games is no doubt approximately equally unlikely.

On the other hand, if you're going to offer as a defence to a charge of, er, manipulation that no-one could possibly bother because the prize money involved is so low, it's probably a good idea not to create a website boasting about your achievements in winning the tournaments which offered that meagre prize money.

ejh said...

As a sequence, yes. But as an overall score for that sequence, less so...

Lee Bullock said...

I should add that at many of these events where he got scores of 0/5 or 0/4 or very low scores he was walking around smiling sbd having laughs and at no time looked down or depressed. He also played extra fast and lost his games in under 45 mins. When he won events he was taking ages on his moves and taking it seriously. There were times ofcourse where he tried but just lost as he is around a 150/160 player in u120s or u125s you are still going to blunder and lose games.

I remember on 1 occasion where he was sandbagging he was playing a 56 grade in the last round. It was around plus 12 in the position and he offered his opponent a draw.

His worst loss of all was versus a 52 grade. He deliberately took 45/50 mins on 2 moves in the opening. When they were the only moves! He ofcourse lost the game on time before he got to 40 moves. He found it hilarious when i spoke to him after. I was disgusted.

But to get back to the 26 losses and 1 draw. His slow play results were totally normal. The following season he won about 4/5 rapid tournaments but his slow play results went totally down hill. The zig zag grade jumps and drops coincided when the rapid season was on do he could play in the lower rapid events. He even took the liberty once to play in the u100 at Richmond. A player playing in opens and getting 50% just s few years previous. He ofcourse won the event easy beating a 6 year old beginner and a 7 year old beginner in the process. Think he won the first 5 then offered his usual draw in the last round.

I just hope something can get put in place to stop this happening in future.
An idea is to stop players from playing in minors once they have won 3 that season. Or maybe 2. Kinda depends how active you are. Another idea is to bring in a rating floor like they have in the US. If you go above 120 you Can't drop down to that section again unless you stay below that grade for 2 years or something. But a 140 player should not be able to play in u100s or u120s. It's just not on.

Anonymous said...

RdC

We're veering off-topic, but I think what you attribute to Peter Clarke is perhaps more an (excessive) aversion to risk, exemplified by parking the bus, laying up, blocking etc.

Yips and dartitis involve a sort of involuntary mental or physical spasm and inability to commit, so perhaps the parallel is hovering your hand over the board without being able to move, or picking up a piece and being unable to put it down, or complete the move.

AWIC

Anonymous said...

You would need action at two levels. First that the ECF add to the grading page a note of the maximum published grade. Second tournament organisers would have to state in their entry forms that eligibility was based on maximum published grade.

Such a reform would make it lacking in point to attempt to reduce a grade below a threshold, if it had ever been higher.

Those who have played tournament chess for forty years or more are aware of the potential problem. Perhaps administrators and titled players are unaware of the issue.

Back in the early seventies there was a flame war (to use a modern term) in the columns of BH Wood's magazine "Chess" between Hugh (CHO'D) Alexander and a player who spotted that he had regional grades in both the South and the West. Could he use the lower of these grades to enter and potentially win a tournament that would otherwise exclude him?


RdC

RdC