One thing I like about graphs that show the evaluation after each move of a game is that they give a nice overview of how the game went. You can see if a win was smooth or if there were many adventures along the way.
However, one doesn’t need the information in the whole graph to determine how back-and-forth a game was. So I tried to boil this information down to a single value, the volatility of a game.
Defining volatility
When thinking about volatility, I first thought about the formula for the standard deviation of a set of numbers
I wanted to use a similar formula with the expected score after each move. But it doesn’t make sense to talk about the “mean expected score”. So, instead of looking at how the expected score after each move differs from the mean, I looked at the change in expected score after each move.
I ended up with the following formula
where xs(k) denotes the expected score after the k-th move.
Squaring the difference and taking the square root in the end has the effect that moves that change the evaluation a lot have a bigger impact than moves that change the evaluation only a little bit.
Note that this volatility score is similar to looking at the quality of play by both players in a game.
Examples
I picked some games as examples to illustrate the volatility score. I decided on Carlsen-Erigaisi, 2025 from Norway chess, as it was a smooth win, and two wilder games with Tal-Koblents, 1957, and Carlsen-Rapport from the World Blitz Championship 2022.
Here are the expected scores after each move for each game:
One can see that Carlsen’s win against Erigaisi was quite smooth, whereas the other two games had many big swings in the expected score.
And here is the volatility for each of the games:
As you can see, the game Carlsen-Erigaisi has by far the lowest volatility score, as one would expect for a calm win.
The comparison between Carlsen-Rapport and Tal-Koblents is more interesting, as the former game has larger evaluation swings, but the volatility is higher for the second game. The reason is that more moves in Carlsen-Rapport don’t change the expected score much, and so they drag down the volatility of the game.
One could argue that the Carlsen-Rapport game should have higher volatility, since the big evaluation swings should more than cancel out the moves where the evaluation was stable. But when putting more emphasis on such evaluation swings, games with one big blunder will also have a higher volatility. So one needs to make certain trade-offs when defining such a score.
Possible adjustments
To tweak the volatility score I used, one can generalise the formula a bit to get
I used the value 2 for a, but one can also increase the value to put more emphasis on big swings in evaluation. Tuning this parameter is always difficult, as there is no clear way to determine which value is “more correct”. I’m quite happy with the results I got for a=2, but let me know if you disagree.
I also thought about using the volatility to compare how “wild” the games of different players are. However, my main problem was that the volatility depends a lot on the opponents, as weaker players play less accurately and therefore have more volatile games. Maybe one could only look at games from one round robin tournament, as all players have the same opponents. But this would give a much more limited view of the styles of the players.
Let me know what you think about this score and if you think that it could be useful.
makes me smile
as a parenthesis, I am curious about your math typesetting tools and exeperience, here (and later on lichess blog medium). Was it tedious, and getting in the way of your sharing? or other things to share about it. I noticed upon reload some fleeting syntax behind. Was is mathML, and what kind of support for that is on here (substack, some kind of markdown+some mathML, e.g. IDK).