Looking at the Quality of Play in Chess 960
How well are grandmasters playing compared to classical chess?
With the Grenke Freestyle Open, there was now an event where many chess 960 games with classical time controls were played. This means that there are now a reasonable number of games to analyse and look at the quality of play in them.
For this post I analysed the games from the Grenke Freestyle Open and the classical games from the Freestyle Grand Slam events played so far. I only looked at games where at least one player was rated over 2500 and the rating difference didn’t exceed 200 points.
Accuracy by move number
Previously, I looked at the relative number of mistakes made by GMs broken down by different criteria, which serves as a good comparison for chess 960 games.
For this purpose, I defined a mistake to be a move that reduces the expected score by at least 10%.
The biggest difference between normal chess and chess 960 is that, with the pieces shuffled randomly, the players have to think for themselves from the start and development isn’t as straightforward as usually. So one would expect that there happen more mistakes early on in 960 games. And that’s what we see in the data.
The graph shows that starting from move 1, grandmasters make a relative high number of mistakes. They seem to make more mistakes in the first 10 moves than from moves 20 to 30. The number of mistakes drops after move 30, probably because by then the positions have often normalised and the players can rely on their knowledge from normal chess.
As a comparison, here are the relative number of mistakes in classical chess:
The picture is as one would expect, there are hardly any mistakes early on, as the players can rely on their preparation general opening knowledge. Then the number of mistakes ramps up towards the middle game before slowly going back down in the endgame.
Mistakes based on evaluation
One thing I found interesting about classical chess is that fewer mistakes happen in equal positions. As a reminder, here is the graph for the for normal chess:
This graph shows the relative number of mistakes depending on the evaluation of the position for the side to move. So an evaluation of 200 means that the side to move has an advantage of +2 and -150 means that it’s -1.5 from their point of view.
There are hardly any mistakes in positions where the side to move is equal or a bit better. I think that the main reason for this is that early on in the games, the position is often quite level and the players are still following theory and so are making fewer mistakes.
In comparison, here is the same graph for chess 960.
There are is only a dip in the number of mistakes in completely even positions. The reason for this could be that players are playing more accurate in equal endgames which drags down the average number of mistakes.
The relative number of errors depending on the evaluation doesn’t look as “smooth” for the 960 games as it looks for normal chess. One part of the explanation is that there are fewer 960 games in the dataset so it is more noisy. It would be interesting to see how the picture changes as more and more games get analysed, as another explanation could be that an advantage of +2 or +3 is often more difficult to play in chess 960 as the positions are less common.
Conclusion
As one would expect, many mistakes in chess 960 games of grandmasters happen early on, as it’s difficult to find the right way to develop in the random starting positions. But as the games progress, the number of mistakes actually drops, probably because the positions start to resemble “normal” chess positions.
It’ll be interesting to revisit such an analysis when many more chess 960 games have been played with classical time controls as the added datapoints should reduce the noise.
Let me know if you have any further questions about the quality of play in chess 960.
Hi - nice analysis and great graphs.
Question on the lower number of mistakes in equal positions - could it be that there are also a large share of forced draws in those positions, which limits the possible scope of best moves? This might particularly be true when it's dead equal, which is an eval I often see when my best continuation was forcing a draw.
really interesting comparative exploration, and highlight.
question: what are the different sample size behind the 2 histograms for the mistakes per evaluation. I might have to read carefully, and they might be in the text already.
more generally, how about termination depths for same top tier of player ratings strates (from the ceiling or the percentiles from there). That would be comparing statistical measures across pools, but that is what you are doing by saying you chose high level play. sorry.. thinking out loud as usual. I am just curious about no matter the level of play but for same relative strates in the 2 cases about depth of terminations statistics.