OK, folks, I promised this some pages ago, and here it is: my statistical approach at finding the best allrounders.
I've pushed some figures around, and the approach that satisfied me most was this:
I first took the original list of allrounders compiled by SJS and then added Kallis as a 'nearly man' (mainly to see how someone from outside the set boundaries would fit into this elite group). This gave me a list of 18 allrounders. To make bowling and batting stats comparable, I calculated standardized normal scores a.k.a. z-scores, using the mean and standard deviation from this group of players.
The next step was to make scatterplots, plotting a bowling variable against a batting variable.
In bowlavbatav0.jpg, standardized batting average is plotted against standardized bowling average.
In wickmtchinn500.jpg, standardized innings per 50 score is plotted against standardized wickets per match score.
Then, the key is to combine the batting and bowling characteristic into one score with which you can compare the different players.
For batting and bowling average, the best player is of course he who combines the lowest batting average with the highest bowling average. In the first scatterplot, that's the player whose datapoint is closest to the lower right corner of the plot. in the second scatterplot, it's the player who is closest to the upper left corner of the plot.
Those of you with some knwoledge of ancient Greece may notice that you can compute the distance from a datapoint to the corner of the plot using that old bugger a^2 + b^2 = c^2, and that's what I did.
The exact formulas to calculate the distances top the best corners for both plots were:
plot 1: distance = sqrt(((6-zbatav)^2)+zbowlav^2)
plot 2: distance = sqrt(((5-zwickmatch)^2)+zinnp50^2)
Note that the subtraction is needed to get the direction of the vector right, and that the figure that is used to subtract from is arbitrary. Which number is used does affect the outcome, but the 6 and 5 used here mean that both the bowling and batting vector are on average (roughly) equal. Varying the arbitrary subtraction number only influences the outcome of the analyses if you take stupid arbitrary numbers
Without further ado, what are the results?
Using the bowling and batting average as the basis for analysis, this is the top 5:
1. Imran Khan
2. Miller
3. Kallis
4. Faulkner
5. Pollock
Using the wickets per match and innings per 50, you get:
1. Gregory
2. Botham
3. Cairns
4. Imran Khan
5. Faulkner
The first analysis appears to slightly favour players with an extreme average in either bowling or batting (e.g., Kallis and Pollock), whereas the second analysis yields a top 5 who are more clustered in the centre of the graph, indicating that they are good at both rather than great at one.
For those of you looking for a clear-cut answer, I've added up the ranks for all players from the two analyses, and it results in this list:
1. Imran Khan
2.Miller
3. Faulkner
4. Botham
5. Gregory
6. Goddard
7. Cairns
8. Noble
9. Sobers
10. Pollock
11. Kallis
12. Kapil Dev
13. Hadlee
14. Mankad
15. Benaud
16. Giffin
17. Durrani & Sinclair
Make of it what you will, but this is the most statistically sound comparison I could come up with based on SJS' original table.
EDIT: the numbers in the plots refer to that player's position in the rankings for that analysis, so numbers do not represent the same players across plots