• Welcome to the Cricket Web forums, one of the biggest forums in the world dedicated to cricket.

    You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join the Cricket Web community today!

    If you have any problems with the registration process or your account login, please contact us.

Yet another random statistical measure of batsmen

GIMH

Norwood's on Fire
I don't think you can use this as a measure of overall ability as according to the graph, going from 80 to 160 is much more important than going from 0 to 80. It's useful to be able to measure that aspect of batting, but in a match situation going from 0 to 80 is going to be at least as important, and probably more so, than going from 80 to 160.
All contextual though isn't it. By and large I agree, but I felt that a big part of the improvement in England around 2010 when players started scoring big hundreds. So not so much the 80 to 160 but 110 to 180.

I actually made a thread once, circa 08, complaining that our players rarely scored daddy hundreds. No idea if it was statistically validated.

Anyhow, I do agree with you by and large and I just felt the need to ramble on and here I am still doing it, why are you still reading this post?
 

Lillian Thomson

Hall of Fame Member
Internet Explorer for some reason blocked page 1 of this thread "to protect your computer". Has this happened to anyone else? I had to open it in Firefox and can see no reason for it.
 

Prince EWS

Global Moderator
I don't think you can use this as a measure of overall ability as according to the graph, going from 80 to 160 is much more important than going from 0 to 80. It's useful to be able to measure that aspect of batting, but in a match situation going from 0 to 80 is going to be at least as important, and probably more so, than going from 80 to 160.
Yeah, Spark said above that it's not intended to be a measure of overall ability, basically for this reason. It's basically designed as a more sophisticated way to measure the frequency of batsmen making big scores than setting an arbitrary cut-off at 150 or 200; I think it achieves that goal.

As an interesting aside though, I'm not sure it's actually a worse measure of overall ability than traditional raw averages. It'd be interesting to see what the correlation between each score in the history of cricket between 0 and 400 was with winning, or winning+drawing.. I might try to calculate that. I suspect it'd be closer to Spark's function than the linear function we use to calculate raw averages.
 
Last edited:

Howe_zat

Audio File
Be interested to see that. I reckon making 35 is much more effective than people give it credit for, for example, but have nothing to back that up
 

OverratedSanity

Request Your Custom Title Now!
Be interested to see that. I reckon making 35 is much more effective than people give it credit for, for example, but have nothing to back that up
I agree but I'd say a 100 ball 35 is more effective for the team than a 40 ball 35 in most situations. Whether a 35 is useful depends on what kind of 35 it is.
 

Prince EWS

Global Moderator
Be interested to see that. I reckon making 35 is much more effective than people give it credit for, for example, but have nothing to back that up
Yeah, I'm setting it up now. My database doesn't have results in it so I'm having it go and mine different StatsGuru queries; it'll take an hour or two to actually do that.
 

Prince EWS

Global Moderator
Yeah, Spark said above that it's not intended to be a measure of overall ability, basically for this reason. It's basically designed as a more sophisticated way to measure the frequency of batsmen making big scores than setting an arbitrary cut-off at 150 or 200; I think it achieves that goal.

As an interesting aside though, I'm not sure it's actually a worse measure of overall ability than traditional raw averages. It'd be interesting to see what the correlation between each score in the history of cricket between 0 and 400 was with winning, or winning+drawing.. I might try to calculate that. I suspect it'd be closer to Spark's function than the linear function we use to calculate raw averages.
Be interested to see that. I reckon making 35 is much more effective than people give it credit for, for example, but have nothing to back that up
Okay so I've done this and I'm pretty surprised by the results. They seem to back up your feeling that the difference between 0 and 35 is actually the biggest difference of 35 that exists.

Here's how we measure contributions currently through raw averages: completely linearly.
rawaverages.png

Here's the function Spark designed largely to measure big scores non-arbitrarily.

sparkfunction.png

But here's the actual historical correlation between each score and not losing (I chose not losing rather than winning because I figured the difference between drawing and winning is almost always bowling rather than batting).

historicalcorrelation.png

To explain that graph a bit, I'll give some examples. At 50 on the x-axis, the y-axis reads 0.672. That means that 67.2% of the scores of 50 (actually, 40-60, as I took in wide births to avoid anomalies on the graph, but that's not really important as it didn't change the overall shape of it) are made in teams that don't go on to lose. Given it's actually 52.04% at 0, it's basically showing that, historically, there's greater value in getting to 50 than converting a 50 into a ton (79.5% at 100).

It backs up the Spark function idea that there's little material difference between 250 and anything higher than that, but the way sub-150 scores work seems to be the opposite of the Spark function, which surprised me greatly. Avoiding low scores has historically been a better way to not-lose than having a player make a really big one. I do wonder if that's being thrown off by tailenders (particularly given they bat less in general in winning teams) or pre-war cricket in some way.. I might try to exclude them and see what happens to it.
 
Last edited:

Howe_zat

Audio File
Oh nice.

I particularly like the near vertical jump after ~12 runs. Maybe that's where 'getting set' happens
 

Howe_zat

Audio File
Ftr I think the reason why avoiding very low scores is key is because it stops collapses. Even just seeing off a wicket taking spell has value.

As follow up to the previous conversation -

Getting from 0 to 80 adds about 22% to your chances of not losing. Getting from 80 to 160 adds about 16%.
 
Last edited:

Prince EWS

Global Moderator
Oh nice.

I particularly like the near vertical jump after ~12 runs. Maybe that's where 'getting set' happens
That actually is a byproduct of me trying to get rid of anomalies and smooth over the graph. It's actually showing scoring 1-3 is really horrible because that's when it stops counting.

If you want to look at smaller, high frequency scores then the literal graph is better:

historicalcorrelation2.png

As you can see it suffers from random sample size fails at higher numbers though.
 

Prince EWS

Global Moderator
Another way to look at it is in terms of "at least"

historicalcorrelation2.png

So that's showing x=50, y=0.7475.. meaning that any score of 50 or more has a 74.75% correlation with not-losing.
 

adub

International Captain
Love this. Particularly the flat line after 240. Want to bat the other side out of it and 240 or so is your number eh. But even just getting to double figures really helps your side out.
 

Spark

Global Moderator
My first actual attempt at a weight function was actually just a log function, which reproduces something fairly similar to what Cribb has there (hence the weird IF code, to avoid explosions due to log[0]), but that just felt wrong to me.

Ftr I think the reason why avoiding very low scores is key is because it stops collapses. Even just seeing off a wicket taking spell has value.

As follow up to the previous conversation -

Getting from 0 to 80 adds about 22% to your chances of not losing. Getting from 80 to 160 adds about 16%.
I can see how this works but it certainly doesn't fit my general intuition -- I take point about stopping collapses is a good one, but on the other hand, how many bowling teams have been let back into what seemed an increasingly one-sided match because a set batsmen on 60 or 70 threw it away? What good is seeing off the wicket-taking spell if you don't cash in a la Cook a few weeks back and make it count? What's the gain to be had in seeing off a top-class Boultee burst and then chipping one to mid-on off the spinner? I've been a believer for a long time now that big scores have a disproportionately large value to the team because they define an innings, and allow other batsmen to make those smaller scores and build partnerships (which are the core currency). It'd be interesting to "control" for this somehow, and see what proportion of these "useful" lower scores come with a big score being compiled at the other end, but off-hand I'm not sure how you'd do that. Alternatively, what if the graph there is just a function of the simple fact that smaller scores are simply more likely for obvious reasons, and with the number of batsmen who bat in a Test, you're going to have some small scores in any winning total somewhere along the line?

This is all part of a broader conversation that I want to have btw, and this is partially aimed at starting - to what extent is batting non-linear, and why do we assume it is when we measure averages, strike rates, etc etc etc?

Internet Explorer for some reason blocked page 1 of this thread "to protect your computer". Has this happened to anyone else? I had to open it in Firefox and can see no reason for it.
Puush got hacked a while back, that might be it. Should be fine now though.
 
Last edited:

weldone

Hall of Fame Member
Another way to look at it is in terms of "at least"

View attachment 22074

So that's showing x=50, y=0.7475.. meaning that any score of 50 or more has a 74.75% correlation with not-losing.
Ya I was going to suggest this.

But theoretically, this particular curve should never be downward sloping. Why do I see some small downward slopes (example around 165-170)? Are you still working with ranges for this one too (example 50 is actually 40-60)? Should not do that for this one imo.
 

Prince EWS

Global Moderator
Yeah there are a couple of individual scores around that 160 mark which no-one has ever recorded in a losing effort which is what makes it slope downwards after they're removed from the rest.
 

Spark

Global Moderator
Haha the thing about Moores is that his was probably the least "data-reliant" team in the amount of detail and depth England seem to go into stats-wise, it's just that they used stats in an awful, awful way from what people seemed to say about their approach.

Stats like these are always conversation starters, and nothing more.
 

vic_orthdox

Global Moderator
My first actual attempt at a weight function was actually just a log function, which reproduces something fairly similar to what Cribb has there (hence the weird IF code, to avoid explosions due to log[0]), but that just felt wrong to me.



I can see how this works but it certainly doesn't fit my general intuition -- I take point about stopping collapses is a good one, but on the other hand, how many bowling teams have been let back into what seemed an increasingly one-sided match because a set batsmen on 60 or 70 threw it away? What good is seeing off the wicket-taking spell if you don't cash in a la Cook a few weeks back and make it count? What's the gain to be had in seeing off a top-class Boultee burst and then chipping one to mid-on off the spinner? I've been a believer for a long time now that big scores have a disproportionately large value to the team because they define an innings, and allow other batsmen to make those smaller scores and build partnerships (which are the core currency). It'd be interesting to "control" for this somehow, and see what proportion of these "useful" lower scores come with a big score being compiled at the other end, but off-hand I'm not sure how you'd do that. Alternatively, what if the graph there is just a function of the simple fact that smaller scores are simply more likely for obvious reasons, and with the number of batsmen who bat in a Test, you're going to have some small scores in any winning total somewhere along the line?

This is all part of a broader conversation that I want to have btw, and this is partially aimed at starting - to what extent is batting non-linear, and why do we assume it is when we measure averages, strike rates, etc etc etc?
...?

How many matchwinning 70s have there been in Test cricket?
How many matchwinning ducks have there been in Test cricket?
 

Top