Back to Paul Boisvert Faculty Web Page
1. The purpose of these rankings is to measure the offensive abilities of major league baseball players from 1900 onward. Players who played both before and after 1900 have only their seasons from 1900 onwards considered. The rankings indicate nothing about players' defensive or pitching skills. The rankings are based on what the players actually accomplished at the plate, rather than what they might have accomplished if it hadn't been for the war, for injuries, for strike-shortened seasons, etc. The reader is, of course, welcome to adjust her opinions of the players' offensive skills for such factors (as Bill James does, for example), but I choose not to speculate.
2. My analysis is designed to generate a high combined value of simplicity and accuracy. The simplest common offensive measure that is substantially accurate in gauging players' overall offensive skills is OPS (On-base % Plus Slugging %). But to make such a measure far more accurate for many players, info encoded in OPS should have added to it info about stolen base success, GIDP, and (technically, though far less important) SF and SH. Then it must be placed in the proper context (Outs made--see below). This is all easy to do for even the otherwise uninformed, amateur fan (or sportswriter!) using my method and readily available statistics, and results in quite accurate measures of batters' offensive skills and production.
Like all such measures, in order to judge how a player's production reflects his actual ability, the results must be adjusted to his home-park's offensive characteristics, and to the year-to-year changes in baseball's overall offensive conditions. My way of doing this is explained below--the ballpark adjustments are completely routine, but the year-to-year adjustments are slightly non-standard, to better allow me to adjust for "missing" data from less recent years.
3. All modern analysis of baseball statistics is informed by that of Bill James (BJ), who is a "genius" in the field--his role is comparable to Einstein's in modern physics. As with all geniuses, this does not mean he has a perfect take on every single issue--there are several areas, some fairly basic and fundamental, in which I (and others) disagree with his views. Also, he can be somewhat mystifyingly inconsistent in applying his own methodology to analysis, sometimes relying instead on quite idiosyncratic "informal" and subjective views, rather than deferring to his own (more objective!) statistical analysis. Regarding most aspects of baseball, though, his judgments are obviously more informed, insightful and valuable than my own. Thus, I often compare my offensive judgments to those of James--the results are generally in substantial agreement, but where they differ, I always worry(!)
Geniuses "see" things differently than others, and provide new ways of asking and answering questions about information. One, perhaps the prime, such example for James is his crucial notion that offensive production must be considered in a context of Outs made by the player, rather than At-Bats (AB) or Plate Appearances (PA). This should be considered the Fundamental Theorem of Baseball Offense, and much nonsensical sports media commentary would be avoided if mainstream sports analysts understood it. Finally, I note that I have drawn my statistics and Park adjustment Factors from the (beautiful) website Baseball-Reference.com.
4. The basic
process I use for measuring Offense is to compare a player's Bases (B)
to his Outs (X). This generates two main measures. The
first is described here, the second in part 9. below. The first
measure is the Offensive Percentage (OP),
which is simply the ratio ( Bases / Outs ) , or B / X .
This is very similar to, and was inspired by, similar measures invented
by Barry Codell (Base-Out Percentage, or BOP) and Tom Boswell (Total Average,
or TA), each of which has some (different) minor flaws, to my mind.
My OP measure simply uses the main idea that they share, while remedying
the minor flaws. Using standard abbreviations:
Bases (B) = TB + BB + HBP + (SB - CS) + SF + SH
- GIDP
Outs (X) = (AB - H) + SF + SH + CS + GIDP OP = B / X (times 1000 and rounded, when one prefers to treat it as a whole number, not a decimal.) |
To give a rough idea of the range and variation in OP, the Major League OP in 1960 (my base year) was just about 0.600 (as a whole number, "600", from now on), including pitchers' (weak) batting stats--thus, for non-pitchers, it was probably around 650 or so. Since I adjust for year-to-year changes (see below), it remains (by definition) around 650 (once adjusted) for non-pitchers these days. Further references from now on mean "adjusted" OP unless otherwise specified.
So an average batter--and by "batter" I do not include pitchers, from now on--has an OP of around 650. But that includes lots of part-time, utility players. Regular starting players probably have an average OP of around 700, though this average obviously varies by defensive position. Thus, for regular starters (and my offensive rankings are usually for players who were predominantly such players), in any given year, being in the 700's is average to above-average, the 800's are good to very good, the (rare) 900's are excellent to superb, and the (very rare) 1000's or above are "great" years. Conversely, being in the 600's is below average to average, in the 500's is poor, and in the 400's or below is terrible. Note that an OP of, say, 450 means that in a 27-out game a team would get roughly 12 Bases, for which a typical box score might be something like scattering over 9 innings the following: 4 singles, one double, one HR, and 2 walks. This would typically result in 1, 2 or (if lucky) 3 runs. Obviously, a team of hitters like this would win very few games.
Because of much overlap in the info coded in the two measures, both my
OP and the standard OPS are often quite close in numerical value for many
players. But this is true only for players who don't deviate much
from league norms for SB, CS, and GIDP: extremes in these factors
can significantly alter OP compared
to OPS. Think of Ricky
Henderson here, whose OP is significantly higher than his OPS because of
his great base-stealing and low GIDP totals. Also, if two players
(who are near league norms in SB and GIDP) both have the same OPS, the
one with the higher On-Base % (and thus lower Slugging %) will usually
have a proportionally higher OP. This is because, unlike Slugging
%, On-Base % contains walks and HBP, which raise Bases without raising
Outs--and this process (by definition) increases OP.
Also note that an (adjusted) OP of 1000, which is quite rarely exceeded in any given season, means 1 (adjusted) Base per Out--a very simple and easy-to-remember figure by which to benchmark true greatness in offensive performance. OP, like OPS, can go much higher than 1000, of course--each year, the very few best players, often having their career year, can have an OP of 1100, 1200, very rarely 1300, possibly up in the 1500's. The 3 highest single-season OP's in history before Barry Bonds were as follows: Ruth's best was 1579, Williams' was 1586, and Mantle's was 1593. BUT, Barry Bonds' last 4 years (through 04) have been 1848, 2118(!!), 1606, and 1676, the four highest in history, with the two highest being far better than any other player's. These are quite anomalous, even for Barry himself, whose previous high was 1385 back in the early 1990's, and who had been on the usual late-career downward decline until 2001, when he discovered some very...um, stimulating medicine.
I note that Barry's recent OP's are significantly influenced by his insanely high BB and IBB totals, which I have adjusted a bit for, and about which more below. But regardless, from 2001 on, he has set all-time (adjusted!) records for season homers, OBP, and SLG--unheard of this late in a career. My numbers show Barry to have now passed Ruth as the greatest overall offensive player in history, but it is very unlikely that this would have happened without steroids. Whether that's here or there is up to the individual baseball fan to judge.
5. Some Notes About OP:
A. Caught Stealing (CS) and Grounded into Double Plays (GIDP) are removed from the Bases total in the numerator, since those plays remove runners from the bases. Players who get caught stealing, or hit into double plays, are doing worse (for their team) than just making an "extra" (normally uncredited) out--they are negating a previous batter's base (their own, for CS) as well, and must be penalized. Codell penalizes neither, while Boswell penalizes only CS.
For example, with a man on first and no one out, it is better to have the next two batters strike out than for the first one to hit into a double play. After the former, there are two outs and a man on first, while after the latter, there are two outs and no one on first. Thus, a batter hitting into a DP must be assigned two outs AND a negative base for the removed batter. Similarly, a runner who gets a single with no one out and then is thrown out stealing ends up with a net (team) result of 1 out and no bases. Since he was assigned a Base for the single, he must be assigned a negative base for removing himself via the failed steal (along with creating an out, of course.) Finally, note that the extra out from the GIDP is included in the denominator of OP, along with the batter's normal out, which would be counted in (AB - H). And, of course, CS is similarly included in the denominator as an extra out.
These considerations can significantly lower the OP of many otherwise highly touted players--Jim Rice hit into lots of DP's, for example, significantly reducing his team's ability to score runs. As BJ agrees, and my OP shows, Rice is very overrated offensively (and not just for the DP factor--Fenway Park adjustments hurt, too, and he didn't walk much.) Similarly, many "fast" players who lack technique get CS far more than they should, lowering their OP a fair amount.
Tangential Note: In fact, (SB - CS) in the numerator of OP is NSB, Net Stolen Bases. This creates a very simple, comparably OP-like offensive measure of "just base-stealing ability", namely the BSOP. That is to say, BSOP = NSB / CS. It then seems that, in general, a runner should try to steal only when his individual base-stealing ability, as measured by his current BSOP, is greater than the OP of the current batter. Alternatively, one might argue that he should steal (in general) only when his BSOP is greater than the overall OP of the lineup that day--he will then be likely to improve that overall OP by engaging in the activity. Note that a 2:1 ratio of SB to CS (say, 20 SB with 10 CS, resulting in 10 NSB) yields a BSOP of 1000, which is very good, just like a regular OP of 1000 is very good. If the overall OP of a halfway decent starting lineup is, say, 750, then it seems that runners should only steal if they can exceed a ratio of 7:4 for SB to CS, as that will yield a BSOP of 750 as well. However, it should be noted that steal attempts distract the pitcher and can yield errors on throws, so a slightly lower BSOP might still be worth it. But some other factors may make SB's a bit less valuable than other bases, and so may argue for requiring a slightly higher BSOP.
B. Very Minor Issue: What about SF and SH? Should the extra bases that these allow runners to advance be credited in the numerator? I say yes, as does Codell, but not Boswell, who ignores them completely in his TA. Since these generate Outs, they must be included in the denominator. For SH we know that the batter deliberately made the Out in order to advance a baserunner--it seems that, unlike for many outs batters make where we don't necessarily know (from summary stats) whether baserunners advanced, we should give him credit for that extra base that we know he generated on the part of the runner. Note also that, by assigning a base to a SH, we get the following: a no-out single followed by two SH's, results in the same (team) OP as does a Triple followed by two stikeouts--in both cases 3 bases have been assigned and two outs. More importantly, it results in the same real situation--man on 3rd, two out. The point of OP, to the extent that it is objective, is that it should, as often as possible, assign the same number of bases and outs to different ways of equivalently reaching the same real situation--this requires that SH be assigned a base. The same reasoning applies to SF, and here it seems even more important to assign the extra base, since that base is clearly valuable, as it actually scores a run!
In general, of course, we don't, when calculating OP, assign batters the extra bases that they cause other runners to advance, simply because we don't know such info, at least not from summary stats. (Issues of "opportunity" space would also arise if we were able, and wanted, to so assign them.) But in the special cases of SH and SF, where we are sure what happened, and must use the Outs for accuracy, it seems best to include the base that the batter "deliberately" caused the runner to advance. I note, though, that OP's calculated without SF and SH will still be very valid indicators of player's offensive ability.
C. The BB in the top of OP is Total Bases on Balls, including IBB's, which I don't separate out. See next paragraph.
6. This brings up
the main issue--why measure only Bases, rather than (as James and many
others do) try to convert them into "Runs"?
And why treat all Bases as equal--aren't walks less valuable than singles,
IBB's less valuable than regular BB's, etc.?
First, note that both approaches rely on "average" Team results--for me, 3 straight singles, resulting in, say, a run scored and men on 1st and 2nd, results in assigning the Team, for OP purposes, only 3 Bases, while in fact the runners would have traveled 7 bases total. My method must assume (which proves to be true) that, in general, Bases assigned to Team batters will, on average, over many innings, translate (via the extra, unassigned bases that they cause other runners to sometimes advance as well) into mathematically predictable numbers of Team Runs. Note that we might wish to argue that 3 consecutive singles is really worth roughly 1.75 runs in this case (1 run in, man on 2nd and 1st are partial runs-in-the-making). But of course 3 isolated singles in 3 separate innings would not be worth 1.75 runs, but rather 0 runs--i.e., in this case, 3 bases assigned to team OP yields 0 actual runs. BJ's methods also convert measurable Team offensive total statistics into average total Team Runs Created (RC), and many other methods do something similar. No one ever knows how accurate these results are regarding any individual stat from a single at-bat or inning or game--rather, one can only try to measure long-term totals, correlating runs with total offensive team statistics.
On
average, over time, for entire leagues, by my method, just under 4
Team Bases assigned to OP yields one actual Team Run scored. (This
is, however, not
quite a strict linear
correlation, for a fairly obvious reason, which I'll explain below.)
Thus, there is in fact a very high (not-quite-linear) "correlation" (ability
to predict one from the other, on average) between Team Bases and Team
Runs. This is why my method simply uses Bases, rather than trying
to extraneously convert them to Runs, as do BJ and others.
Given that both approaches correlate highly (we'll discuss how highly below) with Team Runs, the advantage of OP is twofold: first, simplicity--the average fan can avoid BJ's (or others') quite convoluted runs-created type estimates (let alone Win Shares, yeesh!) and just count Bases. These are easily available in even standard newspapers, at least for almost all important categories, and are always available on the Internet.
For example, until they got hurt, was Frank Thomas or Maglio Ordonez the better Sox hitter through Aug. 1 of 2004? The Tribune's daily stats (though leaving out GIDP, which is of some importance) let one calculate their rough OP's in a few seconds, revealing that Thomas (as usual) was better--the notion that Ordonez, though a very good player, had in general become the Sox offensive "star" over the last few years isn't true (OPS also reveals this, though not quite as accurately.)
Secondly, BJ insists that the batter's job is to "Create Runs"...but I just don't see that. Other than hitting HR's, players can't "Create Runs" solely by themselves--and even in those cases, do players really go to the plate to "deliberately" hit a HR? Instead, what a player actually tries to do is almost always to "Get On (the most advanced possible) Base without making an Out." Or rather, teams full of batters who do this will actually score more runs than teams full of batters who try by themselves, on their own, to "score runs"--such as by wildly swinging for the fences--at the expense of a greater risk of making an out.
If, in reaching the Bases that a batter is able to succeed in reaching, he happens to drive in a run (or score one, on a HR), great, but to a large extent that's not under the batter's control in most situations. Consider the following: runner on 3rd, two out--should the batter try for a single ("creating a Run") or try to wangle a walk (which doesn't "create a run", at least not unless some subsequent batter comes through)? Conventionally, and implied in the "Runs Created" conceptual focus (and in its formulaic practice), the batter should try for the single, and has somehow partially "failed", relatively speaking, if he "merely" gets a walk. This ignores the probabilities, however--it may in fact be almost impossible for him to get a single, since the pitcher isn't stupid, and will give him little to hit! In fact, pitchers would, on average, love to see hitters who are more intent on getting the single, since those batters will then swing at bad balls and make outs far more often than if they are intent only on getting on base.
Of course, when such a lunging hitter does happen to get a single, they will win (false) acclaim for being a clutch hitter (which BJ rightly dismisses). But, in general, teams full of hitters who take walks (when offered) in those situations, rather than trying solely for difficult-to-get singles, will score more runs overall than teams full of players who do the opposite. The former will have subsequent batters succeed in driving in more runs (including, perhaps, the additional run of the walked batter) than will have been driven in by the latter's hitters, who lunge at bad balls trying to score the run, but who much more often end up making the last out of the inning.
Now BJ is well aware of this logic (he explicitly derides RBI's as a measure
of worth), yet his focus on "Runs" for individual batters (rather than
simply Bases) seems to
argue against it.
So do his formulas, about which more in a moment. While it's true
that walks on average advance base runners less than singles, the underlying
reasoning for considering a batter who gets a base via a walk to have done
"less well" than one who gets a single must, however, include something
like this: it is under the batter's control (to a large extent) whether
he takes the walk or gets the single. The example above shows, however,
that this is, in general, misleading--when Barry Bonds takes a walk in
such a situation, his alternative is often to swing at balls that he has
very little chance of hitting safely. He has no "real" alternative
to taking the walk, except for (probably) making an out--and were he (foolishly)
to try to get the hit, he would far more often get outs, thus hurting his
team's chances in the long run. Thus, Barry has, on average, done
the best he could do when he takes a walk in that case. The base
he gets by the walk is (in general) as indicative of his ability to help
his team score more runs (in general, by getting on base without making
outs) as would be the base he might (more rarely) get by hitting a single.
This seems (roughly)
true even though the rarer
singles would advance the runner, unlike the (more common) walks.
A great example of the opposite phenomenon is the young Sammy Sosa, who constantly lunged at bad balls and had a mediocre OP (and OPS) for years, until he learned to take a walk. He then promptly became one of the most feared hitters in the game, and every offensive aspect of his game improved. I remember quite well being amazed by this, as I had given up hope that he would ever learn to walk--but suddenly, he began repeatedly letting 2-strike curve-balls in the dirt go by without swinging, unlike his previous practice. And shortly thereafter, he began hitting more than 60 HR's a year! Thus, the walks he now takes with runners on base are (roughly) just as indicative of his ability to help his team score runs as the singles he (occasionally) also gets in those situations.
The same logic applies to getting IBB's as opposed to regular walks. It may be argued that the batter didn't "do" anything to earn the base, so it should be credited less as an indication of what he can "do". Also, IBB's don't advance runners, unlike base hits and (some) regular walks. While this latter argument is certainly true, the question is whether the IBB is (in general) roughly as good an indication of the batter's ability to help his team score runs as a regular walk (or single) would be. And I see the answer as "Yes, it is (almost) as good an indication, and shouldn't be penalized (much)." Thus, since simplicity of calculation is a virtue, I penalize it not at all (like BB's and SB's, as well) to avoid the inevitable ever-changing decimal discounts that BJ-type formulas use.
The batters who have the highest proportion of IBB's are generally (by
other measures, not includng the IBB's) the best batters anyway. That is
why they are
getting IBB's in the first
place! Moreover, many "regular" walks are semi-intentional anyway--the
ones where the batter has little chance to get a hit by lunging at bad
balls--so is it worth discriminating?
It might be, if formulas that do so discriminate tend to correlate with
Team Runs significantly better than OP does, sufficiently so to
override the much greater
simplicity of OP--but the correlations are all roughly the same (see below).
Only when a "crucial" IBB is issued in the bottom of an inning, from the
9th onwards, in a tie game, with a runner on base, and with the batter's
run guaranteed to be irrelevant, might one want to really discount IBB's--but
this is just not a
common enough situation
to counteract the gain in simplicity for having OP count IBB's the same
as regular walks. I note that Barry Bonds, the greatest IBB'er in
history,
gets them all the time in
situations other than the "crucial" one just described above--all of his
"non-crucial" IBB's could lead to (eventually) important runs, just like
regular
walks. And the reason
he gets them is that he's the greatest hitter of all time--so their frequency
is in large part simply a reflection of that.
7. Correlation: Having said all the above, I must admit that it is almost certainly slightly more accurate to discount BB's a little bit, compared to singles, and IBB's a bit more (as in formulas like Runs Created), than it is for OP not to do so. Similar arguments could be made for discounting the value of SB a bit, as does BJ. But the correlation coefficients of BJ's RC, Codell's BOP, Boswell's TA, and many others' measures, as related to Team Runs, are given on the website http://www.knology.net/~johnfjarvis/runs_survey.html
Codell's and Boswell's measures come in at correlation r-values of .942
and .934 respectively, with (one of—he has so many!) BJ's RC at .954.
All are very high (good)
correlations. Since
I believe that Codell and Boswell's measures (which each treat Bases as
“all created equal”, like I do) are each slightly incomplete and flawed,
I believe
that my OP, which remedies
those flaws, would come in correlating a bit above either of theirs.
And the correlation difference between .942 for Codell and .954 for RC
is,
statistically, of very minor
importance, compared to the simplicity of calculation for BOP, or
for my OP. Moreover, the (already complicated) RC has now been subsumed
in BJ's practice by Win Shares, a "Finnegan's Wake"-like method almost
totally inaccessible to the amateur fan.
Thus, I use OP as the best single measure of an offensive player's rate of performance. It correlates very highly with actual Runs, and is simple and easy to calculate. Also, it is easily adapted to (and largely unaffected by) missing minor information, particularly SF and SH, and fairly easily adapted to more important missing information, like CS and GIDP.
Minor
technical note: For OP, total Team Bases don't correlate linearly
with total Team Runs quite as well one might hope--but this same "flaw"
is also present in RC
and other methods.
Actually, it's not a flaw, but reflects the reality of how Bases create
Runs in a Team game, which isn't linear. The reason is that bases
scattered among many different innings have different (non-linear) likelihoods
of creating runs than the same number of bases all occurring in fewer innings.
If LOTS of bases are created in a game, there is a much greater probability
that a bunch of them will occur in the same inning, and so score more runs
than a linear model would predict. Thus, there is a "clumping" effect
for Many Bases in a game, as opposed to the likelihood that Medium numbers
of Bases in a game will be more scattered into different innings (preventing,
on average, enough of them from occurring in the same inning to get many
runs to score.) For those interested, these effects are better explained
mathematically by methods related to the Poisson distribution.
For example, if a team gets 8 bases in a game, they will tend to be scattered
through the 9 innings in ways that would make it fairly likely that, say,
only 1 run actually
scores. But if a team
gets 32 Bases in a game, with only 9 possible innings to distribute them
into, it's quite likely that, say, 6, 7, 8, or maybe even 10 of
them may occur in a particular
inning, with other, smaller clumps in another couple innings, leading to
more runs being interactively created in those "Big" innings than
would be linearly expected.
The likely number of Runs given 32 Bases is more than merely 4 times the
number likely from 8 bases in a game, due to this higher probability
of a "clumping", Big-inning
effect. This precludes a strictly linear relationship.
When I tried to linearly predict League Runs Per Game from League Total
Bases, I found that the general relationship of just under 4 batter's Bases
per Run
was quite highly predictive
for leagues with "normal" OP"s, but a bit worse for Leagues with low OP's,
and even worse for leagues with high OP's. Still, the fact that
it's not linear doesn't
mean it can't be predicted--when I added a small, non-linear adjustment
factor (proportional to the deviation of League OP from a historically
"normal"
level) to the linear formula,
I got quite nice prediction results for both High and Low OP League runs.
This non-linear correction factor reflected the clumped-into-same-inning
interaction (or lack thereof) that generates more (or fewer) runs for teams
with extreme (high or low) OP's than a linear model would predict.
But, note that we don't need (or want!) such a correction factor to be applied to OP's for individual batters. In fact, this is another reason why "forcing" individual batter's measures to correlate linearly with Team Runs (as BJ and others do) is, in fact, misleading! This is because the non-linearity of Team Runs as projected from Team Stats is solely due to the synergistic interaction of different batters. All other things being equal, if 6 batters in a row get hits in an inning, you score more runs on average than if 1player gets a hit in 6 different innings! But our purpose is only to measure the skills of that latter individual player, relative to other individual players, NOT to project his (perhaps very good) offensive production as though the surrounding players were also going to produce at that level. Give Barry Bonds 12 bases credit for, say, hitting 6 doubles in 6 different innings in a game, but don't assume that he deserves credit for ALL the runs that a TEAM would create if the Team had 6 doubles in a row (in the same inning!) I note that BJ admitted this discrepancy between RC and top individual performances in one of his old books—and has lately dealt with it somewhat differently (better, I believe, but at the price of daunting complexity) in Win Shares.
At any rate, OP remains a very accurate, very simple measure of offensive performance and skill--as long as it is adjusted as in our next section, part 8, which follows.
8. Ballpark and Year-to-Year Adjustments to OP.
A. Ballpark: I use BaseballReference.com's Ballpark factors to multiply (inflate or deflate) the Bases B in the numerator of OP. This is a standard technique, I believe. I do note that, in general, Park Factors adjustment is often problematic, since different players in the same park often have different abilities (due to luck) to take advantage of its deviations from the offensive norm, particularly when based on whether they bat Righty or Lefty. Thus, Joe Dimaggio was affected by Yankee Stadium far more than were Ruth or Gehrig, through no fault of their own. (In fact, it's not clear why all Lefties everywhere shouldn't be "adjusted"--penalized--simply for being Lefties who get to face "easier" Righty pitching most of the time.) Nonetheless, I have followed standard practice here.
B. Year-to-Year: Rather than doing what BJ does, and multiply Bases B by a factor indicating whether average League Runs Per Game has proportionally increased or decreased over some arbitrary base year (1960, for me), I have chosen to adjust (in the same manner) based on whether League OP has decreased or increased. This results in slightly different adjustments than those BJ and others might use based on “Runs” changes, since, though OP and Runs are well-correlated, they are not perfectly so. Nonetheless, there is no reason to believe that, since we're measuring OP in the first place, it's less accurate to adjust to yearly changes in OP than to changes in Runs, which also merely correlate (imperfectly) to summary team offensive categories, but which don't correlate perfectly with individual skills. Rather, runs correlate with synergistic interactions of individual offensive skills--but those are largely due to luck. My single driving in a run while yours doesn't is based on my luck that the previous batter got a double while you had no such luck.
But, my process has one advantage, which is in dealing with "Missing" stats. Since one doesn't have, say, GIDP or CS info for early years, how does one compare Ruth to, say, Bonds, whose OP will be penalized by including all his CS and GIDP's, whereas Ruth's OP isn't so penalized. In my method, by "adjusting" via League OP figures, which are calculated for early years without the negative contribution of the missing GIDP and CS stats, Ruth gets compared to a higher League OP than if we had the missing stats. Thus his OP, which is also higher than it would be if we had his missing stats, gets deflated by comparison to the higher League OP, which is exactly the (general) result that his missing CS and GIDP stats would have accomplished had they been available. Bonds' OP gets compared to the current League OP's, but both are calculated with the negative stats included, so he is accurately measured relative to League OP.
Of course,
if Ruth's (unknown) missing stats were less harmful to his OP than the
League's would have been to the League OP, he will be penalized proportionately
more than he should be. Still, given no info at all, the best we
can do is treat him according to league average info--this is far better
than artificially inflating his stats
by explicitly assuming he
never had CS or GIDP's. And, obviously, much simpler than having
29 or so different Runs Created Formulas to adjust to missing data, as
BJ does!
C. The AL after the DH: The overall AL League OP's obviously increased when the DH went into effect, but that doesn't mean that AL hitters should now be penalized by a (more deflationary) comparison to the higher AL League OP's. These AL hitters are NOT finding it easier to get higher individual OP's because conditions are now easier for each of them individually--it's not like the ball was juiced or parks shrank, which would justify discounting the hitter's increased performance. Rather, the weak-hitting pitchers have simply been barred, artificially raising the league OP, but having nothing to do with the remaining hitters' ability! They should still be compared to what the OP would have been had pitchers not been barred, since we are going to compare NL hitters to their NL League OP's, which don't have pitchers barred. So, for the AL, I found the average OP for the 4 years immediately pre-DH, the average OP for the 4 years immediately post-DH, and the proportional "inflation" factor (just like for a ballpark adjustment) reflecting the increase from the former to the latter. Then I multiplied the 1960 base-year OP by that inflation factor, and used this New Base Year OP to calculate the year-to-year inflaction/deflation factors (by the usual method) for all future AL League OP's.
While the above is not guaranteed to be a perfect solution, my OP evaluations of players who switched leagues since the DH started have revealed no major or consistent anomalies in the way that their careers proceed when they switch leagues, as evaluated by the standard BJ model of rising OP, peaking around age 26 to 28, then slowly declining. BJ again deserves the "genius" label for seeing this consistent pattern through the distractions of yearly and ballpark changes, and it is perhaps the second most obviously true but idiotically ignored [by standard sports media, who act like they believe that every aging hitter's next season has a good chance of equaling their long-past peak season] fact in baseball--after the fact that batting average is often misleading. This league-switching consistency, together with the fact that the percentage increase due to the DH that I calculated by this method accords quite well with intuitively fairly plausible quantitative notions about how much better DH's are likely to hit than pitchers, makes me think that the adjustment has been fairly successful.
D.
Tangential note about BJ: (Skip ahead to part 9 if you don't want
to lose the main flow.)
BJ seems to think, and explicitly says, that we want to (or should want
to--at least, HE wants to) measure the "value" of players to their teams,
and designs his methods to provide evidence only for that value.
I, on the other hand, think it fundamentally obvious that we want to measure
players' objective skills, whether or not those may have greater or lesser
value to any particular team, for reasons beyond the individual player's
control. A team with 8 players as good as Babe Ruth will find little
value in hiring BarryBonds--they would already be scoring so many runs
that they would win virtually every game anyway. But that doesn't
mean, should Barry Bonds happen to be traded to such a team, that our mathematical
measures of his offensive performance should be set up so that they suddenly
decrease, merely to reflect the fact that he's providing great offense
that has little value to that team (which doesn't need more offense--but
that's no fault of Barry's!)
Yet Bill James' analytical methods, both in the original RC versions, and
in the newer Win Shares set-up, ensure precisely such value-oriented discounting
(or inflation, in opposite circumstances) of a batter's offensive performance,
for reasons that have nothing to do with his individual ability to hit
a baseball. I must admit that I am
utterly baffled by his "value"
focus. It obviously bears on the DH issue, since, after the DH started,
runs were more "valuable" in the NL, but only because the NL pitchers
at the plate continued to
sabotage 1 out of ever 9 run-scoring opportunities. This didn't mean
that if Barry were traded from the AL to the NL, he would suddenly become
a better hitter--just a more "valuable" one, as compared to the generally
lower NL League (or, to his NL team's) level of runs scored. But
for BJ, this utterly changeable (due to external conditions that don't
affect the batter's hitting ability) "value" is the only measure he is
interested in! It's as if he wanted to study the function of various
types of
motorcycles, and then decided
that the fastest, quietest, and most reliable model, of which there was
only one in the whole world, should be considered as "not very good"
because some rich guy who
already had 8 other almost-as-good motorcycles was willing to sell that
single best one for $50. After all, goes BJ's implicit logic, it
couldn't be "good" if it doesn't have more "value" to this person than
$50--could it?
Obviously, the fact that something might have little "value"--sentimental,
cash, or otherwise--to someone, has nothing to do with the intrinsic worth
of the thing, as
evaluated by objective factors:
how fast, reliable, or quiet the motorcycle is, for example, or how many
Bases and Outs the batter creates. But while an individual batter's
Bases, to be evaluated objectively,
must indeed be compared to conditions that DO affect the individual batter
(ballparks he plays in, lower mounds, juiced up balls, etc.),
they must NOT be compared
to conditions that DON'T affect the individual batter--and a lack of pitchers
hitting, or presence of other good hitters on his team, don't affect
the offensive skills or
performance of the individual batter we are evaluating. Yet BJ effectively
denies this with his methodologies. It's important to realize, of
course, that this is not
an error of "reasoning" on BJ's part, but merely an expression of what
BJ wants to measure--I just don't know why on earth anyone would want to
measure what he wants to
measure, any more than we would want to measure how good a motorcycle is
by seeing what someone will (happen to) pay for it.
Of course, for most "normal" teams and situations, such differences in
"value" vs. "objective worth" are often minor, and tend to average out
as a player plays in different
parks, leagues, and years.
And BJ's Win Share results, though it's a bit hard to disentangle his overall
judgments of players from the evaluation of merely their offensive performance,
agree fairly substantially with my OP's offensive evaluations. But
there are often at least somewhat substantial differences, especially in
relative ranking, and I think that the conceptual difference between his
"value"-oriented approach and my “intrinsic worth of performance” methodology
probably accounts for many of them.
(End of Tangential Note.)
9. Total Offensive Production: 2BMX leads to NTB
A. Obviously, when we ask whether a batter had a good year, we don't want to say "Yes" if he had only 3 Bases and made 3 Outs (for an OP of 1000), but then got injured for the rest of the year. He didn't sustain his OP over a long enough time. And if two players have seasons with OP's of 800, but one made 200 Bases and 250 Outs, while the other made 400 bases and 500 outs, we clearly want to say that the second one had a season that was "twice as good as the first"--he produced twice as many Bases at the rate of 800 OP as the first one. So we need some measure of Total Production, in addition to OP (which is merely the Rate of Production.)
Exactly similar considerations are appropriate for players overall Careers. Who was a better overall player in their career: someone ("Vince"—not a real player) with 4800 Bases and 6000 outs (for an OP of 800), or someone (Pete) with 6000 Bases and 8000 outs (for an OP of 750)? Hard to say--it depends on our notions of the relative value of hanging on to produce at the end of one's career at a below-average OP rate. Note that Pete can be thought of as having first had the career of the Vince (4800 Bases and 6000 Outs), then playing a few more years, and in the process producing 1200 more Bases and 2000 more Outs--but that's an OP of 600 for the last part of his career [1200/2000 = 0.600 ]. That OP of 600 is below average for offensive players--should we give a very good player (800 career OP is quite good!) much credit for hanging on as a below-average player for a few more years, which lowers his OP from 800 to 750? We don't get very excited about watching marginal players who are 600 hitters during their entire career, so why would be excited about watching a man who has become one, regardless of his previous performance? On the other hand, longevity is admired, per se--and career totals get one in the Hall of Fame (HOF), so maybe we should all admire him and root for him to keep on hanging on! Is it just a matter of personal preference?
B. First, we need a measure of Total Offensive Production. Fortunately, the Bases and Outs perspective gives us a simple one: Bases Plus (Net Bases).
Obviously, simply measuring the Total Bases in a career gives us one way
to judge total production, but if Joe has 6000 bases and 7000 Outs, and
Al also has 6000 Bases but made 8000 outs, we all agree that Joe had the
better career--he accomplished the same amount of positive stuff (Bases)
while committing fewer negative acts (Outs.) An easy way to quantify
this is to add to each player's Bases their (Net Bases), i.e., their (Bases
MInus Outs.) I note that for most players, Net Bases are negative,
so we (in effect) penalize the better player less for his fewer Outs.
I also note that
Bases
Plus (Net Bases) = Bases Plus (Bases
minus Outs) = (2 times
Bases) minus Outs
= 2B
-
X (using X for Outs.)
So I am going to refer to"Bases Plus Net Bases" from now on as "2BMX" , using M for minus and X for outs. |
Thus,
2BMX
is an overall quantitative measure of a player's total offensive production,
including a "penalty" for the Outs committed while accomplishing
that production.
So, from a "total production" standpoint, 2BMX is a measure that can be
used (at least for some purposes) without needing to know the OP of the
player--some info about OP (i.e., info about the Outs made while achieving
the Bases) is already encoded into 2BMX. That is, roughly speaking,
high 2BMX totals, which indicate high levels of total offensive production,
can't really be achieved (certainly not in a single season, and,
to a lesser degree, not for a career, either) unless a player has a fairly
high OP in the first place. Still, there will inevitably be, when
measuring total production, some tradeoff between sheer longevity and the
rate of OP--and 2BMX as a single measure won't differentiate fully between
the two. Thus, there will still be a need to examine the tradeoff
between 2BMX and OP, which I do in parts 9E and 9F below.
Of course, if, for a player's career or season, Bases exceed Outs, then Net Bases is positive, so for players with OP above 1000, 2BMX is actually larger than their mere Base total--they get a "Bonus" for being so good, rather than merely reducing their "outs" penalty. There are only ten players, through 2004, with career OP above 1000: Ruth 1263, Bonds 1211, Williams 1203, down through Mantle, Gehrig, McGwire, Hornsby, and Cobb to Frank Thomas at 1012 and Albert Pujols at 1005. They have quite varying 2BMX, as their careers are of quite varying lengths, and their OP's range from 1263 to 1005, a fairly big difference.
For the example in the first paragraph of B. above, Joe would have 2BMX of 5000 [ = 2(6000) - 7000 ], while Al would have 2BMX of 4000 [ = 2(6000) - 8000 ]. This difference of 1000 directly reflects the extra 1000 outs Al made while accomplishing the same positive stuff that Joe did. What about the example in the 2nd paragraph of A. above--someone (Vince) with 4800 Bases and 6000 outs (an OP of 800), vs. someone (Pete) with 6000 Bases and 8000 outs (an OP of 750)? Vince has 2BMX of 3600, while Pete has 2BMX of 4000. Thus, by this measure, Pete had the better total career production, 4000 to 3600. Again, we could consider Pete as having had Vince's career first, then playing a while longer at an OP of 600 (lowering his career OP from 800 to 750) but, in the process, contributing (to his team) an additional total offensive production of 400 (in 2BMX terms), thus raising his overall 2BMX from 3600 to 4000.
C. The advantage of using 2BMX for career production is that it's linear in outs and bases. Thus, 1 Base or Out for any player at any point in their career contributes the same amount to their overall total 2BMX as a Base or Out for any other player at any time in their career. To see why this is good, consider a possible alternative, though somewhat mathematically similar, measure: (Bases) Times (OP). This is the same as B( B / X ), instead of 2BMX's formula of B + (B - X). [It thus involves multiplying Bases times Bases, then dividing by Outs, instead of adding Bases plus Bases, then subtracting outs. Very similar conceptual structures.] This new measure would deflate one's total Bases by multiplying it by his overall OP--someone who achieved the same total Bases as a second player, but had done so at a lower career OP (thus making more outs in the process) would have his Bases deflated by more than the second player. Sounds fair--but here's the problem: someone who got 1 more extra Base after a career OP of 800 would get more credit toward his total (have it deflated less by multiplying by .800) for that extra Base than someone who got 1 more base after a career OP of only 700, since the latter base would be deflated more, by multiplying it by .700. This is unfair on the face of it, since in each case, regardless of the previous total production, the extra base should add the same marginal value to each players' total production.
But there's a drawback to 2BMX for career offensive measurement purposes: it weights Outs too much for some purposes, and only allows one's Career 2BMX Total to increase (rather than decrease) if one is playing at an OP level of 500 or more. That is, a season at less than 500 OP results in a negative 2BMX value for that season. While an OP of 500 is pretty low, lots of marginal players, including even good players at the end of their careers, often drop below that OP, and their overall 2BMX starts to go down. It's a bit odd to have our measure show their total overall offensive production going down when they are still making enough bases to hold a major league job. Similarly, at levels just a little above 500, their 2BMX goes up, but VERY slowly--less so than we might intuitively believe that even somewhat marginal players are truly contributing. Moreover, for "proportional" comparisons, a "0" 2BMX level (at an OP rate of 500) makes for strange results, leading to the conclusion that a rarely-used utility player might be "infinitely better" than another one who had, say, 200 bases and 400 outs.
D.
So, is there a better measure of total offensive production than 2BMX?
Yes, it turns out that “Bases Plus k(Net
Bases)” , where k is any positive constant,
is still linear, and various k's could be
chosen, in order to allow "Outs" to contribute smaller penalty amounts
than in 2BMX. The
farther below 1 that k is, the more the result gives credit to "mere" Total
Bases, and the less credit is given to the negative effects of Outs (or,
equivalently, to the negative effects of achieving the same Total Bases
while operating at a smaller OP than someone else.) Since career
OP already measures the rate of production anyway, perhaps pure
2BMX gives too much additional consideration to the info about Outs that
is already embodied in OP, when what we're more concerned with here is
how much cumulative contribution toward offense is produced by (roughly)
"longevity".
At
any rate, the constant that I've found most intuitively appropriate (and
there is certainly an element of arbitrariness here) is k
= 2/3. Using this, I form the new measure, which I
call Net Total Bases or NTB
,
which is:
NTB = Bases + (2/3) (Bases - Outs) |
This measure, NTB, gives players at all OP levels more credit for the actual bases they pile up, while penalizing them less for the outs committed along the way. It thus diminishes somewhat the importance of the OP one performs at while getting those bases, and rewards sheer longevity a bit more than 2BMX does. It also give a player positive NTB as long as the player performs at a level of 400 OP or better, not merely 500 or better, as did 2BMX. And that's a big difference--virtually no one worth rating offensively has any significant seasons below 400 OP, except for severe-injury or "retirement" seasons. (Thus, there is less of a problem--though still a little bit--with seasons at OP of 400 or below being considered to have "0"--or even negative--value when making "proportional" comparisons of total production.)
So, players' NTB totals will usually rise (though sometimes not much) throughout their entire career, even at the end. Of course, their career OP will usually be dropping at the end of their career, often fairly quickly, if they continue to play while marginally skilled--but OP and NTB are two different measures, and one of them should, as NTB does, (usually) indicate that they are still producing some positive cumulative contributions if employed as a major-leaguer.
Remember that players above OP = 1000 will have their Base totals increased by 2/3 of their Net Bases, making their NTB (as did their 2BMX) look even larger. The top 10 NTB in history for their careers range (through 2004) from Barry Bonds at 9177, Aaron 8436, Ruth 8281, Cobb 7950 through R. Henderson, Mays, Mantle, Musial, and Williams to Frank Robinson at 7007. Palmeiro, Bagwell, and Thomas are all near 5200 after 04, ahead of the other active players. The top 10 by this measure, would, I think, be agreed by most people to be intuitively those one might expect to be there on the basis of consistent production and longevity. For example, Aaron played substantially longer than Ruth, though at a lower OP of 954 than Ruth's 1263, so he's a bit above Ruth in NTB, though he wouldn't be above him in 2BMX, which would give Ruth more of a boost for his higher OP. As a measure of total career production, I think NTB accords more with people's intuitive notion of what we want to measure than 2BMX would. Bonds, of course, has now also played (offense) longer than Ruth, and at a much higher OP than Aaron, which is why Bonds now tops the NTB list.
E. But even though NTB takes into account some info encoded in OP, and thus penalizes players' total production to some extent when produced at lower OP levels, is NTB really an effective measure of a player's overall career greatness? Some players have quite large disparities in NTB vs. OP. Is NTB really more important? For example, Hank Greenberg had OP of 942, roughly 25th best of all time (active players high on the career OP list will drop as they age, making a final ordinal rank uncertain for any given player), but had NTB of only 3334, not even in the top 100 of NTB. (Of course, the War is why his NTB isn't higher, but the point remains, how should we evaluate such a player?) Or King Kong Keller, OP of 998, 11th best of all time, but 2788 NTB is not very noteworthy. Injuries here are to blame, but again the general point recurs: Keller and Greenberg (and others) were far greater players by some intuitive (rate-based) standards than NTB indicates.
One way out of this is to include, like BJ, a 3rd measure, "Peak Performance", discussed below in part 10, but let's leave that aside for now. Can we combine info about OP and NTB (realizing that NTB is already partially influenced by Outs, and thus implicitly by OP) into a single overall "Offensive Value Measure"? Neither NTB nor OP by itself is in all cases an adequate or sufficient measure of most people's intuitive notions of overall career offensive value. But, unfortunately, any attempt to combine the two into a single value measure will always be subjective, and arbitrarily based on the relative personal value given by a particular analyst to "higher performance for a shorter time" vs. "lower performance for a longer time".
Of course, the above fact didn't stop BJ from producing precisely such a value measure, and basing a ranking upon it--nor will it stop me! But it must be honestly admitted that the following is merely what I find most "intuitively" pleasing, and has no objective basis that can be claimed to support it to others who have different intuition. Of course, as he admits, the same point applies to BJ in his latest Historical Abstract Player ratings.
What I have decided is to create an overall Career Offensive Value which
weighs NTB and OP equally--that is, to (sort of) find the "average" of
the two. But we immediately find a problem with that--as numbers
they are not in the same ranges, not really comparable, so a simple average
will be dominated by the number with the higher values, i.e., NTB.
In situations like this, a simple approach is to convert each absolute
measure to a percentage of some benchmark value--a percentage of a "top"
score in each respective range, or of a "mean" score in each range.
Such percentages will then fall in the same ranges, and can be averaged.
To keep
the percentages under 100%
(usually), I decided the benchmark should be (close to) a "top" mark in
each field. I also used the "top" mark because I know the top marks,
but don't know the "mean" NTB for ALL players!
Still, any "top" mark chosen for either measure is still somewhat arbitrary--should
I compare OP's to the current top career OP of Ruth's 1263; or to a seemingly
impossible-to-top-in-the-future
career value of 1400; or to the (current!) highest single-season mark of
Bonds' 2118; or to a merely "great" (and round) OP of 1000? And what
level should I calculate NTB as a percentage of? Bonds' current top
NTB of 9177 (after 04) will probably increase in 2005. How about
10,000?
Problem: such arbitrary choices of "top value" for comparison will indirectly change the overall, now- weighted average results of any specific absolute scores for an NTB and OP. That is, the higher the top score I compare NTB to, the smaller those NTB percentages are, and the smaller a simple average of them with an OP percent will end up being--it's as if I weighted NTB less than OP by merely choosing the top benchmark to be arbitrarily higher. Since this is unavoidable, I'll go ahead and do it anyway, but please remember that the overall result does not have any truly "objective" basis. But it accords with my intuition.
F. Overall Offensive Value (ignoring Peak Perfomance)
I did indeed use [ NTB / 10,000 ] and [ OP / 1000 ] as the percentages--I then average them (add and divide by 2) to get the overall Preliminary Offensive Value. But, I am still going to throw in a "Peak Performance" factor below in Part 10, so this Value is merely a preliminary effort.
The fact that the top value for NTB of 10000 is more than anyone but Barry
Bonds is likely to achieve, while many people have OP's near or above 1000,
means that I'm
giving OP a bit more
weight in this average Value (since OP's resulting percentages will
be a bit relatively higher than NTB's) than I could have otherwise chosen
to do. This accords with my intuition--Charlie Keller's OP of
998 makes him a fairly "great" player overall even though he had only 6
good seasons (but 5 of them were great!). This choice (when combined
with a final Peak Performance measure described below in Part 10.) results
in an overall Offensive Value for players in which 10 points of career
OP has the same contribution to Value as 100 NTB.
Put another way, consider the situation where, after many years of playing, Dave and Ed have the same current Value, but then Ed retires while Dave plays one more year. Suppose Dave gets 100 additional NTB, but at a low OP that lowers his overall career OP by 10 points. Then Dave will keep the same Value as Ed--the extra 100 NTB will exactly counteract the 10-points-lower career OP. But this is because a 100 NTB season is one such as the following: 150 Bases and 225 outs, at an OP of 666. This isn't much production--it's a part-time season at a mediocre level of OP (which must be substantially below Dave's previous OP to have lowered it by 10 points) at the end of a career, and shouldn't (intuitively) change Dave's Value compared to Ed. The fact that a formerly quite good player can hang on part-time at mediocre levels should raise his NTB, but not our overall judgment of "how great a player" he is. At least, that's my intuition.
But if Dave got 200 NTB in that extra year, but still only lowered his OP by 10 points, his Value would go up above Ed's (by 4 overall Value points, it turns out--each 100 NTB is worth 4 Value points, as is each 10 points of OP). I think this again makes intuitive sense--200 NTB is a pretty good, almost full-time year (like, say, 250 bases and 325 outs at OP of 770: an above-average OP for a near-full-time year in major league baseball). Such a year does add to our impression of Dave's ability to maintain good (not just hang-on) skills longer than Ed. So it should (intuitively) result in a player's overall offensive career Value going up, even though his career OP did drop by 10 points (say, from 825 to 815--which would still be a pretty good career OP!)
For such a situation of 200 extra NTB, Dave's overall Value (and don't
worry about the details yet until we get to Peak Performance) might go
up from, say, 503 to
507--these are pretty good
(not great) Values, around 200th best or so in history. (Actually,
the Values would be 50.3% and 50.7%, but I prefer the whole number
version.) This extra,
fairly good season wouldn't raise Dave much above Ed, rather just a bit--and,
to my mind, a (roughly) intuitively plausible amount.
10. Last Step: What About Peak Performance?
A.
BJ says that what we care about when we ask whether Mays was greater than
Mantle is, to a substantial extent, "Was Mays at his best greater than
Mantle at his best?"
This brings in the idea of "Peak Performance". Since baseball is
divided into discrete seasons, and since great ("peak") individual seasons
can indeed help teams win seasonal championships that they might otherwise
fall just short of--such championships being the defining goal of the game--I
think it is fairly plausible that BJ is right about this, to some degree.
Two players with the same career OP and NTB might still be intuitively
considered to be of different levels of greatness if one (Ryne) had better
"peak" seasons than the other (Lou). Of course, by definition, in
such a case Ryne must have had WORSE non-peak seasons than Lou, on average,
and usually numerically more of them--so it's not clear how much a few
better seasons should override many worse seasons, when the final overall
averages are the same. After all, people were still paying good money
to see their teams win in the "other" seasons, and one of Lou's better
non-peak years may have provided just the lift needed by his team to win
the pennant that year, while Ryne's lesser non-peak season just missed
(perhaps) being what would have let his team win that year.
Still, the intuitive lure of peak years as indicating greatness, to at least some degree, is quite strong. So I've gone with it. But not in BJ's manner: I think he weighs peak value far too heavily--the consideration in the above paragraph (that a player's non-peak value could also be just the "edge" needed for a pennant winning team) should mitigate its weight much more than he does. In the latest Historical Abstract, BJ includes both a 3-best and a 5-consecutive-best peak-seasons measure as two of his 5 categories. Thus, roughly 40% of BJ's value is from peaks, they are pretty short peak periods, and, worst of all, they often overlap significantly! For almost all players, 2 or often all 3 of the absolute best seasons are also going to be part of the best 5-consecutive year period--it seems way overboard on peak value to thus count them twice!
B.
So I use a weighted average of the top 5 seasons from anywhere in the career,
weighting the best year a bit more than the 2nd-best, which is weighted
a bit more than the 3rd-best, etc. The relative weights are 12 :
11 : 10 : 9 : 8--a nice, smooth progression. While any peak measure
is largely arbitrary, and can only be justified by reference to the analyst's
intuitive feelings, here is my subjective justification for my method:
It avoids BJ's duplication of years, but allows, like BJ, the top 3 to
be weighted as a group significantly more than the 4th and 5th (average
weight of top three is 11, while average weight of bottom two is only 8.5).
The consecutive-year requirement seems completely unnecessary to me--many
players have all their possible 5-year consecutive periods of great play
ruined by occasional injury years, which is beyond the player's control.
And if a player has great seasons separated by bad ones, it is still just
as exciting for the fans (and valuable to the teams!) in the great years--every
year is a new chance to win the pennant. And most players will end
up with at least 3 or 4, and often 5, of the "peak" years being consecutive
anyway--the ones who don't are, again, usually due to injuries, though
some players are just inconsistent (or discovered steroids late in their
career!)
Of course, weightings other than mine are just as reasonable—the choice
of details is largely arbitrary.
C. But what measure should be used to calculate a great season? Answer: 2BMX ! (Adjusted, a bit, for Plate Appearances...)
Well, it can't be pure OP--what if the season is cut way short by injuries? It must be some measure of "total production"--or, at least, only OP for a relatively "full" season. I've chosen the relatively simple measure of 2BMX, which can't be very high unless a player plays a relatively full season. I avoid NTB (for a season), since it seems to me to not quite weight OP enough--some players have obviously monster OP years that nonetheless get cut a bit short (maybe 10% or 15% of their games) by injuries or labor strikes, whatever. Such a "cut short" season is valued relatively more highly when we use 2BMX as our measure than when we use NTB, since 2BMX takes into account more the level of OP at which the Bases were produced. 2BMX will penalize more those players who pile up Bases due to having a full season's worth of PA's while having a lower OP, and reward more those who pile up fewer Bases due to fewer PA's, but do so at a much higher OP. This seems intuitively good to me--many players, like Kaline, had several seasons with OP of 900 or 1000, but never broke the 400 seasonal 2BMX level due to nagging injuries--but using NTB would have given him even less relative credit for those excellent OP years. Also, the drawback to using 2BMX for careers was that it goes negative at OP's below 500--but Peak seasons will never be below 500 OP for any decent batter.
But, there is still a difference in "opportunity space", i.e., Plate Appearances (PA), for players in different eras, in different batting order postion, and playing for teams that are better or worse hitting in general, for either intrinisic or park-effect reasons. Two players with similar OP's can have different total 2BMX production simply because one, through no fault of his own, didn't get to the plate as much as the other, despite playing virtually all their teams' games.
So I have decided to relativize "Adjusted 2BMX" to an (arbitrary, but judiciously chosen) norm of 640 Plate Appearances. That is, if a player had, say, 700 PA, their "Adjusted 2BMX" would equal their original 2BMX times 640/700. If the player had less than 640 PA, their 2BMX remains unadjusted. This prevents players who were leadoff men for high-scoring teams in high-scoring eras in high-scoring ballparks from artificially piling up more 2BMX (at the same OP-level) then players who played the same number of possible games, but batted 6th for low-scoring teams in low-scoring eras in low-scoring ballparks. 640 ( = 4 PA times 160 games), was chosen so that almost any player who played virtually a full season could achieve it. Most players often exceed it, particularly in their prime years--though even some great players never exceeded it, mainly due to many small injuries.
D.
So my Peak Performance Factor, which I call "5W"
(top 5 seasons, Weighted
Average), is as follows:
5W = the weighted average (with weights of 12, 11, 10, 9, 8) of the player's 5 best seasonal Adjusted 2BMX values. |
So if one's 5 best seasonal Adj. 2BMX values are, from best to worst: v, w, x, y, z, then 5W = (12v + 11w + 10x + 9y + 8z) / 50 (where 50 = sum of weights.)
The top 13 historical 5W's are: Bonds at 731 (might still go up!), Ruth 623, Mantle 620, Williams 587, Joe Morgan 526 (those base-stealing years!), Gehrig and Cobb 523, Hornsby 518, McGwire 500, Henderson 497, Aaron 489, McCovey 487, and Mays 485. Again, these seem like quite intuitively plausible results.
The 5 highest season Adjusted 2BMX's are Bonds at 858, 829, and 670, Mantle at 682, and Ruth at 658. There have been 63 seasons of 500 or above, with 31 of them by just 5 men: Bonds, Ruth, Mantle, Williams, and Cobb. Seasons of 400 are also great, 300's are excellent, 200's are pretty good, 100's are common (for full-season players, these range from a little above average to a little below average), and from 100 on down is getting weaker and/or more unproductive offensively. [Of course, a high-OP player could have a very low 2BMX due to not playing much, for whatever reason.) The Table of Value Rankings shows the top 5 season 2BMX values for each player, along with the 5W average.
11. The (Overall Offensive) Value
We now have the 3 pieces that can be combined into a weighted overall offensive career Value:
Career OP / 1000 and Career NTB / 10,000 and 5W / 600
Where did the 600 come from? Note that to weight fairly the 5W Peak factor, it too must be compared to a top value, thus converting it to a percent. I chose 600 as being a pretty high top level--only Bonds, Ruth, and Mantle exceed it. Of course, this relatively high choice of benchmark for 5W means it loses a tad of importance in the overall combined value--but I don't want peak performance counting too much, anyway.
How
do I weight the 3 pieces?
I
still want the first two to be weighted equally (to each other, as in the
preliminary offensive value).
And
I arbitrarily ("intuitively") felt that the Peak Performance piece "5W
/ 600" should count 20% of the total Value.
This
gives me the following weights: 40%
for the OP piece, 40% for the
NTB piece, and 20% for the 5W piece.
Value = 0.40 ( "OP" / 1000 ) + 0.40 ( "NTB" / 10,000 ) + 0.20( "5W" / 600 ) |
This results in the following career equivalencies:
10 career OP points = 100 NTB = 12 points of 5W = 4 Value points.
Because the OP piece is benchmarked only to 1000, the the overall Value can be above 1000, which only Bonds's and Ruth's are. But, in general, the Value tells us "What percent the player is of a 'Top' career offensive level". Thus, Musial's Value of 823 says that he had an offensive career equal to 82.3% of the "Top" such possible career. Of course, by this metaphor, Ruth and Bonds have "more than 100%" of the Top Value level, which is similar to a student getting a course average of over 100%, for an A+, due to extra credit. Bonds and Ruth have performed at the "extra credit" level, as I think we would all agree.
The Top-Ranked players overall for career offensive Value are:
1. 1095
Barry
Bonds (and counting--but massive knee surgeries as of 3/05 make
his future dubious!.)
2. 1044
Ruth
3. 960
Williams (Fenway inflated stats some, but missing war yrs lower
his value...)
4. 953
Mantle (Much higher OP than Mays, shorter career)
5. 900
Cobb (Best non-power hitter, SB's helped!)
6. 882
Aaron (Longevity helps!)
7. 871
Mays
8. 867
Henderson (Lot like Cobb, lower OP, longer career.)
9. 862
Gehrig (Disease lowered final value ...)
10.
827 Joe Morgan (fantastic peak years due to
speed!)
11. 823
Musial
12. 821
Hornsby
13. 817
Frank Robinson
14. 814
Ott
15. 787
McGwire (injuries hurt...but steroids helped!)
16. 778
Eddie Matthews
17. 767
Reggie Jackson (underrated by media due to low offensive context,
ballparks--and had long career!)
17. 767
Tris Speaker
19. 766
Eddie Collins
20. 765
McCovey (great peak years, but missed too many games to rank higher)
21. 763
Frank
Thomas (Could end up 15th best of all time--depends on health.)
22. 754
Honus Wagner (would rank higher, but I don't use seasons before 1900.)
23. 750
Foxx (games played in career actually fewer than many might expect)
24. 741
Bagwell (OP declining now, but might reach 17th or so)
25. 735
Mike Schmidt
26. 695
Dick Allen (Top-ranked offensive player not in HOF, and most HOF'ers
in for their offense are WAY below him; a travesty!)
Players usually rise in Value each year (though their 5W factor isn't settled until they've played many years, of course) until their last few years, when they can't rise much (unless they're Barry). If they hang on at the end of their career at low OP levels, their value may go down slightly, by a few points. Such players, I believe, should indeed be so penalized for not having the dignity to retire--like Fred McGriff at the end.
But these overall Values are, in general, never going to decrease much--unlike career OP rankings, for which late-career drops make it impossible to fix a player's ultimate ordinal position. Thomas currently has the 9th-best career OP, but a normal decline phase would probably send him to around 15th to 20th. NTB, on the other hand, increases steadily for any regular players, and rarely goes down even at the end of a career. Sammy Sosa will probably move up from his current 60th position in NTB to around 35th or 40th with a normal remaining career, though his OP decline will probably continue to be quite rapid, and his Value (640) will probably go up by only 10 or 15 points, perhaps less.
Very young players who start off with great OP are (quite justifiably, in my opinion) ranked fairly highly in Value from the start. Albert Pujols, the only current young player who might be able to exceed Thomas's career, already (through 04) has a stunning career OP of 1005, 10th highest of all time, and a Value of 584, after his first 4 (great) years. After next year, his 5th, his 5W factor will increase even more, and he'll already be well into the 600 level, nearing HOF Rank. Of all eligible players with Value of 625 or above, only Dick Allen, Charlie Keller, Jimmy Wynn, Bobby Bonds, Darryl Strawberry, Frank Howard, Sherry McGee and Norm Cash aren't in the HOF. To my mind, all should be--they were all fantastic offensive players. All are in the Top 63 offensive players of all time, and all have much more Value than dozens of other HOF'ers (valid ones, not just the Veterans Committee choices) who are in the HOF because of their perceived Offensive skills.
BUT, does the fact that, after 4 years, Pujols has career Rank of 584,
which is higher than most longer careers, imply something fishy?
After all, Ryne
Sandberg's Value is only
526--has Pujols really had a much "greater" offensive career after only
4 years than Sandberg did altogether? Well...hard to say.There's
no trouble if we regard the 584 as merely an early indicator that, with
a normally progressing future career, Pujols will end up with a far better
offensive career than Sandberg--that's 99% certain to happen. But
what if Pujols were hit by a truck tomorrow--in that case, most people
wouldn't say that he had a greater offensive career than Sandberg.
But this (valid) point would, of course, be illustrated by their NTB's--Pujols'
is quite low now, and that's one (perfectly good) way to look at an offensive
career.
But even
if Pujols did stop playing tomorrow, to me it's plausible to think that
he is still a "greater" offensive player than Sandberg. His first
four 2BMX season totals are
ALL higher than any of Sandberg's,
and one is 500, one of the 63 best of all time. And his career OP
of 1005 is far greater than Sandberg's ever was. So I think
that the implication that
Pujols's Value already indicates his true greatness is one that
does accord with our intuition. No one has ever had any remotely
similarly high Value after their first 4 years and then NOT gone on to
have at least a reasonably long (and reasonably great) career.
And, finally, what are the lowest career offensive Values for players who played reasonably long careers? These are for weak-hitting infielders (or pitchers!), and I haven't calculated many, but here are a few that are surely near the bottom. Note the low NTB's and extremely low 5W factors, because of their OP's being at or near the 400's:
Value
275
Aurelio Rodriguez OP = 517
NTB = 1043 5W = 80
272
Bucky Dent
OP = 538 NTB = 846
5W = 69
258
Earl Wilson
OP = 598 NTB = 208
5W = 31 Yes, he's actually a great-hitting pitcher!
His NTB and 5W figures should be multiplied by 4 or 5, of
course, to truly compare him to a position player. If that were done,
he'd be better than many infielders.
235
Ozzie Guillen
OP = 489 NTB = 790
5W = 23
231
Everett Scott
OP = 484 NTB = 663
5W = 31
228
Ed Brinkman
OP = 481 NTB = 672
5W = 26
176
Ray Oyler
OP = 434 NTB = 63
5W = 0 (actually, 5W is negative
here, but I don't count anything below 0.)
156
Hal Lanier
OP = 389
NTB = 0
5W = 0 (here, both NTB and 5W
are negative, and grounded at 0--Bill James says Hal's the worst
reasonably long-performing hitter of all time!)
These occasionally negative 5W and NTB values are a necessary but undesired evil resulting from my system, but occur only for incredibly weak hitters. I will normally have no interest in such hitters, except, as above, to set an example of the "floor" for offensive Value. For 5W, negative values will occur when the player's 5 best seasons don't average out to being above an OP of 500--extremely rare, even if the overall career OP is in the 400's. And for NTB to be 0, the career OP must be below 400, which may have never happened for a reasonably long-playing man EXCEPT for Hal Lanier.
Of course, all the above players had lots of defensive value--BJ puts Rodriguez as one of the top 100 3B-men of the game, primarily (obviously) for his spectacular defense. But that's another story...
12. End Note on NTB and Bill James: Nice Coincidence (?)
From Bill James’s latest two books, Revised Hist. Baseball Abstract, and
Win Shares, one can calculate (though BJ doesn’t) the Total Offensive Career
Win Shares
(TOCWS) for all the players
he rates. His (extremely complicated) methods are, of course, quite
likely to be pretty accurate, even though I think they have some
minor flaws, and even though
we also use different Park Factors, about which I have no way to judge
the real validity of his, as opposed to those I use from
BaseballReference.com.
Despite these differences, I had hoped that we would arrive at the same
rough judgments about players’ total offensive careers.
Thus, I compared various measures of my method’s Total Career Offensive Production, i.e. 2BMX, NTB, and other similar measures [which is to say, “Bases + k(Net Bases)” for various values of k ] to (what would have been) his TOCWS results, and calculated the correlation coefficient for each such comparison for roughly the top 70 offensive players. Sure enough, all of my measures correlated very highly with his TOCWS, for my k values of 1.0, 0.5, 0.6, and 0.7, all with r-values in the .94 to .95 range. But the single best correlation of the bunch was for k = 2/3 or .6667, i.e. the “NTB” that I had previously (before reading Win Shares) decided “intuitively” was the most “reasonable” overall single measure of career offensive total production. This (highest) correlation was r = 0.95203. Of course, this doesn’t really mean much, and some other decimal k value might correlate a tad better. And different players can hop about by several rank positions in our respective rankings, but usually only when players are fairly closely bunched among those absolute rankings in the first place.
But I think the reason they agree best at 2/3 was that I chose 2/3 to yield
a “floor” OP of 400, below which NTB would go negative, precisely because
I knew batters rarely had seasons below this. Similarly, the whole
concept of BJ’s “Marginal Runs”, on which Offensive Win Shares are based,
is that there is a “floor” of offensive performance above which “Runs Created”
should be based, and below which very few individual seasons should occur.
Though I haven’t tried to specifically relate my “floor” OP of .400 to
his RC floor (which I’ll skip here), I get the strong feeling that they
would both indicate roughly the same level of (lousy) offensive performance
if compared—thus the high correlation between our measures of total offensive
production over and above the level of those floors.
There are some other very interesting mathematical relationships in BJ’s
latest Win Shares methods, which I’ll save for another time.
***************************************************************************************************
Appendix:
Below are the two measures of Offensive performance which inspired my own (and both of which are pretty accurate themselves!)
BASE-OUT PERCENTAGE (BOP): Barry Codell's stat for measuring complete offensive performance, in which the elements of the numerator represent bases gained while the events in the denominator represent outs produced (sacrifices and sacrifice flies appear in both because they achieve both--gaining a base for the team while costing it an out). The formula:
Total Bases + Walks + HBP + Steals + Sacrifices + Sacrifice Flies
BOP =
----------------------------------------------------------------------------------------
At-bats - Hits + CaughtStealing + GIDP + Sacrifices + Sacrifice Flies
TOTAL AVERAGE (TA): Tom Boswell's formulation for offensive contribution from a variety of batting and baserunning events; as with Runs Created, we have calculated Total Average to make use of the maximum available data in a given year. The concept of the numerator is bases gained, that of the denominator is outs made:
Total Bases + Steals + Walks + HBP - Caught Stealing
TA =
-----------------------------------------------------------------------------------
At-Bats - Hits + Caught Stealing + GIDP