• Not To Be Mean, But, What About The Median?

    Posted by on December 31st, 2009 · Comments (15)

    There’s been a lot of talk lately about how the Yankees, under Brian Cashman, have gotten younger since 2005. For example:

    Average age of Yankee pitchers
    2005: 34.2
    2009: 29.3

    Average age of Yankee hitters
    2005: 32.2
    2009: 30.5

    Of course, the only issue here is sample size. For example, in 2005 the Yankees had Randy Johnson (age 41), Kevin Brown (age 40), Al Leiter (age 39), Buddy Groom (age 39), Mike Stanton (age 38) and Mike Mussina (age 36) on their pitching staff for most of the year. Given the small size of a pitching staff, having these 6 very old pitchers on the team would naturally bump up the average age of the staff.

    Think of it this way – say you had 18 numbers…12 of them being the number 3 and the other 6 being the number 12. Now, the average of all those 18 numbers would be 6. But, in reality, the majority of those numbers (12 of 18) were the number 3. So, seeing the average number (6) tells you little about the majority of the group.

    Or, in other words, a few old apples can easily ruin the average age of the whole barrel.

    Basically, it’s the difference between using the mean and median to look at a data set. And, I would suggest, in a study like this, it makes more sense to use the median rather than the mean.

    Comments on Not To Be Mean, But, What About The Median?

    1. jay
      December 31st, 2009 | 9:06 am

      Why not just include the standard deviation with the mean, so you can get an idea of the spread of the data set?

    2. jay
      December 31st, 2009 | 9:19 am

      Another logical thing you could do to get a realistic idea of the age of the pitching staff and their contributions is to weight a generally accepted stat like WAR or RSAA (not pitching wins or ERA) with the age of the contributor, so as to remove the contribution to the mean by, say, a 24 year old pitcher on the roster who didn’t contribute very much.

      Looking at the median makes little sense to me, if any. There are quite a few better ways to analyze this.

    3. MJ
      December 31st, 2009 | 9:19 am

      I don’t know where the data comes from that yields a 29.3 average age for the pitchers. When I take the ages from B-R.com, I get an average of 28.1 and a median of 27.5.

      In any case, assuming I’m even off by a year, a median of 28.5 would more than prove this point.

    4. jay
      December 31st, 2009 | 9:59 am

      [3] Yeah, I got the same thing when I use all of the pitchers on the Yankees BR page. I then limited it to the top 9 that BR lists, but still didn’t get 29.3 (and that doesn’t seem to make sense either, because there are some major contributors like Aceves lower on that list.)

      I’m not sure what the intent of this post is. The players you listed for 2005 are old, sure. And they’ll raise the average, sure. But those pitchers combined for ~ 550 IP, ~400 of which come from Johnson and Mussina alone, and another ~135 come from Brown and Leiter. Why shouldn’t they raise the average?

    5. Corey
      December 31st, 2009 | 11:59 am

      MJ wrote:

      the small size of a pitching staff, having these 6 very old pitchers on the team would naturally bump up the average age of the staff.

      Think of

      Perhaps he’s taking playing time into consideration?

    6. shaked
      December 31st, 2009 | 12:14 pm

      To go a step further, a weighted average based on total innings pitch would likely be the “most” accurate measure. I am not sure how the original data was used, but if you weighted the innings of each starter as a percentage of the total, you could get a more accurate mean or median. So that 24 year old that had no impact will not have much of an impact on the total.

      Again, I have no clue how this data was originally constructed, but for argument’s sake, using weights would be the most accurate. So a 41 year old Randy Johnson would represent 10-15% of the total innings and therefore 10-15% of the “average age” and not just be a generic number.

    7. jay
      December 31st, 2009 | 1:12 pm

      [6] I agree that’s the best way to do this. This is what I was suggesting in [2].

    8. shaked
      December 31st, 2009 | 1:32 pm

      [7] I see that now. I browsed through comments prior to typing that up, but I must have glossed over your suggestion. I would imagine this would give us a clear picture of age. I believe it is also possible to calculate median in this manner.

    9. December 31st, 2009 | 5:03 pm

      MJ wrote:

      I don’t know where the data comes from that yields a 29.3 average age for the pitchers. When I take the ages from B-R.com, I get an average of 28.1 and a median of 27.5.

      I would bet B-R.com does some weighting based on PT. Think about it – say your team calls up 15 pitchers after 9/1 – and they’re all 20 years old and face just one batter each in September. Would you really use a straight average to get average team age, and give those 15 the same weight as the other pitchers? Makes no sense.

      That said, I stick by my point – having those 6 really old timers covering all those innings in 2005 makes the “average age” – no matter how B-R.com calc’s it, look worse than it really is…

    10. December 31st, 2009 | 5:05 pm

      jay wrote:

      I’m not sure what the intent of this post is.

      Same point as I’ve made in the past – that, yes, 2+2 = 4. But, unless you really know that 2 is “2″ then you can’t assume what you see is true.

      And, if someone says:

      Average age of Yankee pitchers
      2005: 34.2
      2009: 29.3

      Does it really mean the Yankees are getting younger? Maybe…but, maybe not. You have to look “inside the numbers” to be sure.

    11. Pat F
      December 31st, 2009 | 6:14 pm

      using the median isn’t perfect either, and using the mean is a better route because having those “older apples” (actually, 6 of them, which is a ton) is a big part of a team being old, and the mean simply reflects that. sure they really bring the mean up, but that’s the whole idea of a mean. having 6 really old players on a team means a team is really old unless everyone else is super young, which we know not to be the case. of course, you don’t need any numbers besides the ages of individual players to prove the point that the yankees are getting younger. you can just take a quick glance at the ages of the five starters in 05 vs. 09 and the ages of each position player in 05 vs. 09 and see that the yankees are getting younger. cashman is doing a tremendous job in this regard, and that’s really all that matters. because age is flexibility in baseball. to quote rob neyer from the other day:

      But the only way the Yankees can fall into a habit of losing, someday, is by stockpiling too many players in their 30s with big long-term contracts. It’s incredibly difficult to place a value on flexibility, but that value is real and important and Brian Cashman’s awareness of that value is going to keep the Yankees on top for quite some time.”

      http://espn.go.com/blog/sweetspot/post/_/id/1885/yanks-fine-without-superstar-lf

    12. jay
      December 31st, 2009 | 7:48 pm

      Does it really mean the Yankees are getting younger? Maybe…but, maybe not. You have to look “inside the numbers” to be sure.

      I agree. You need to look “inside the numbers” (if you like that term.) A better way to say that you need to look “inside the numbers” is to say that you need to conceptually understand the statistical tools you are using. The mean is the average. The median is the middle number of a data set. We’re trying to get an answer as to whether or not the Yankees pitchers have been getting younger, aren’t we?

      I stick by my question – what is the point of this post? To point out that by looking at the mean, we’re looking at that mean? Why not do the analysis and see if the folks at NoMaas are correct or incorrect? It would probably take 15 minutes.

    13. jay
      December 31st, 2009 | 8:03 pm

      I figured out the way NoMaas did their calculation. They weight the age of each pitcher by IP for each year.

      So it would be accurate to say that the contributions towards the total IP by Yankees pitchers from 2005 through 2009 has been by increasingly younger pitchers. Another way to say this is that we’re getting more IP from relatively younger pitchers. Yet another way to say it is that the Yankees pitching staff has been getting younger. You could even go so far as to say that the article is correct.

      I’m not sure if this qualifies as “inside the numbers.”

    14. Evan3457
      December 31st, 2009 | 11:09 pm

      The article is correct; no matter how you slice it, the 2009 staff is significantly younger than the 2005 staff. You can do it through weighting by innings pitched…perhaps it would be simpler to compare the ages of the pitchers with the same role for the first 10-11 pitchers on the staff, to avoid giving excessive weight to lightly used pitchers. So, here we go…

      #1 Starter
      2005: Randy Johnson, 41; 2009: C.C. Sabathia, 28

      #2 Starter
      2005: Mike Mussina, 36; 2009: A.J. Burnett, 32

      #3 Starter
      2005: Wang and Pavano, 27 (25 and 29); Andy Pettitte, 37

      #4 Starter
      2005: Brown and Wright, 34.5 (40 and 29); Joba Chamberlain, 23

      #5 Starter
      2005: Chacon, Small and Leiter, 33 (27, 33 and 39); 2009: Mitre, Wang, Hughes and Gaudin, 27 (28, 29, 26, and 23)

      I don’t see how one can possibly avoid the conclusion that the 2009 rotation was significantly younger, as a group, than the 2005 rotation.
      ==========================================
      Closer

      2005: Mariano, 35; 2009: Mariano 39

      Primary Set-up Man
      2005: Tom Gordon, 37; 2009: Phil Hughes 23

      Top Lefty
      2005: Felix Rodriguez 32; 2009: Phil Coke 26

      Other set-up men
      2005: Proctor and Sturze, 28 and 34; 2009: Robertson and Bruney 24 and 27

      Long man
      2005: Quantrill? Sturtze? 36 and 34; 2009: Alfredo Aceves: 26

      The only slot in the bullpen where the 2009 Yankees are older is Mariano, who is 4 years older than himself in 2005. The Yankee pen was significantly younger at all other roles in 2009. As Mariano is apparently ageless, his extra 4 years is largely irrelevant.

      ==========================================
      The rotation is a lot younger. The pen is a lot younger. Both were also a lot better. The article is correct and in no way deceptive. Regardless of the method used, this year’s staff is both significantly younger and significantly better, and to the extent that Brian Cashman was able to bring about change, he deserves credit for it, not quibbling about methods of statistical analysis that can fairly be described as irrelevant.

      Now, if you want to talk about differences in the defensive ability of the lineups of 2009 squad compared with its predecessor from four years before, and the impact that has on pitcher’s records, you might start to have a point…

    15. January 1st, 2010 | 5:59 pm

      [...] Steve over at WasWatching had a slight problem with their methodology – and rightfully so.  As he said, “A few [...]

    Leave a reply

    You must be logged in to post a comment.