Archive | March, 2012

Writing Assignment 8: Pak Sudarno’s Big Family

30 Mar

Chapter 5 of poor economics looks at countries that encourage planned parenthood and the effect it has on their population. The difference between voluntary versus forced planned parenthood was interesting to look at throughout the various studies.

The statistic that grabbed me was that in Colombia, women that had gotten contraception since their teenage years were seven percent more likely to work in the formal sector. I am skeptical of this statistic. My line of thinking is that the girls who are able to make the responsible decision to use contraception at an early age were more likely to work in the formal sector anyways. I think working in the formal sector is much more of a function of one’s level of education and what educational background their parents had.

In my regression model I would use employment levels in the formal sector as the dependent variable. On the right side of the equation I would use, parent’s level of income, level of education, parent’s level of education and academic performance as the independent variables. For my dummy variables I would use whether or not the observation used contraception growing up.

I would not expect to see a statistical difference in the dummy variables in the regression equation. You would have to check the coefficient and the t-statistic if you want to see a relationship between using contraception and not using contraception.  If there was a positive relationship you would see a positive coeffiecient with a t-statistic that would be greater than 1.945. If it the coefficient is minimal or the t-statistic is not large enough than there would not be a relationship.

Writing Assignment 8: Pak Sudarno’s Big Family

30 Mar

Chapter 5 of poor economics looks at countries that encourage planned parenthood and the effect it has on their population. The difference between voluntary versus forced planned parenthood was interesting to look at throughout the various studies.

The statistic that grabbed me was that in Colombia, women that had gotten contraception since their teenage years were seven percent more likely to work in the formal sector. I am skeptical of this statistic. My line of thinking is that the girls who are able to make the responsible decision to use contraception at an early age were more likely to work in the formal sector anyways. I think working in the formal sector is much more of a function of one’s level of education and what educational background their parents had.

In my regression model I would use employment levels in the formal sector as the dependent variable. On the right side of the equation I would use, parent’s level of income, level of education, parent’s level of education and academic performance as the independent variables. For my dummy variables I would use whether or not the observation used contraception growing up.

I would not expect to see a statistical difference in the dummy variables in the regression equation. You would have to check the coefficient and the t-statistic if you want to see a relationship between using contraception and not using contraception.  If there was a positive relationship you would see a positive coeffiecient with a t-statistic that would be greater than 1.945. If it the coefficient is minimal or the t-statistic is not large enough than there would not be a relationship.

Moneyball

19 Mar

The movie Moneyball is a great depiction of how econometrics and regression analysis can be used to predict player value. Billy Bean (Brad Pitt) hires special assistant Peter Brand (Jonah Hill) due to his seemingly progressive way of valuing baseball players. A recent economics graduate from Yale, Peter Brand created a formula for valuing player performance using independent variables like OPS, Slugging Percentage, and On-Base Percentage. Of these variables Brand believed that On Base Percentage most explains the sabermetric statistic Runs Created which is a statistic developed by statistician Bill James. The dependent variable, runs created, accounts for (Total Bases * (Hits + Walks))/(Plate Appearances).

The regression equation that Brand created went against the typical way of thinking that general managers and scouts in major league baseball held. Baseball people value such variables as appearance, off field concerns, fielding and age. These variables Brand deemed as insignificant to the runs created variable. Billy Beane was the target of early criticism as people could not explain some of his roster moves, which included trading All Star First Baseman Carlos Pena. Skeptics believed that the success that the A’s displayed was random and could not be explained. Could regression analysis have actually been that big of a role in the success of the A’s?

It would seem that regression analysis was a crucial component in helping Oakland overcome their measly 40 million dollar payroll. With little room for error, Billy Beane and Peter Brand had to take as much variability out of the scouting process as possible. The best way of doing this was by using statistics that could be explained. Variables like character issues are either insignificant or very difficult to compute and therefore should not be accounted for in the player evaluation process. I found it interesting how the team struggled early on in the season, and Brand noted that it was too small of a sample size to draw any conclusions from and that to judge a team’s true performance one needs to let the season play out. His statement held true as the team wound up performing better late in the season and fulfilling their potential. It also speaks to how the playoffs are so difficult to predict because

Billy Beane described their philosophy as being that of a card counter. The A’s had a competitive advantage in evaluating players that other clubs did not. They could pay players less money than they actually were worth in terms of runs created. That worked in the short term, however as people started to account for the same variables that Billy Beane accounted for, the A’s have found that their success is hard to sustain. When Billy tried to replace Jason Giambi and Johnny Damon, he already had underpaid All Stars in place like Miguel Tejada, Eric Chavez, Barry Zito and Mark Mulder. The role players that he added were positive additions and worked for one season, but to maintain success teams need to have adequate payrolls to keep franchise players in place.

This was my second time watching the movie and I picked up so much more after taking Quantitative Methods this semester. Bill James’ theories on baseball and statistics revolutionized the way we look at statistics and baseball. I am glad that Beane and Brand had the boldness to pursue his theories and that it worked out on a short term basis for the A’s.

 

 

Post # 7 The Law of Genius and Home Runs Refuted

9 Mar

The article by Dinardo and Winfree argues rather pessimistically that steroids effects on homeruns are almost impossible to measure. They argue that there are too many variables that can go into hitting homeruns, including quality of pitching, weather, distribution of talent across teams and the number of games played. They argued that to prove this hypothesis would take considerably more shoe leather than a simple statistical analysis.

The authors investigate bold claims by a researcher named DeVany, who claims that the law of homerun hitting is the same as the laws of human accomplishment. He assumes infinite variance of homeruns with a probability one. He claims that steroids have no effect on homerun output. The authors claim that the infinite variance is flawed and that the size distribution of homeruns cannot follow a power law distribution and a posited class distribution would misrepresent the data. While much of the analytics went over my head, the basic theory was that they could not prove the effects that steroids had in baseball. They did not refute the claim that there may be an effect on homeruns; however, they did make the point clear that it would be difficult to prove.

The author notes that “Inferring the existence of fundamental causal laws—that is, the law of genius—from the statistical distribution of some outcome is difficult, at best. The authors focused on looking at the distributions of these different causal laws, and they found that their distributions came out weird. For the power law, the distribution predicted that 11% of players hit negative homeruns. While I respect the overall message that finding the results from this data will be difficult, I felt as if the authors did not disprove that steroids could have an effect on performance. While I did not see any immediate issues with the assumptions of the classical linear regression model, I feel as if further analysis into the paper’s message is needed to grasp the full value of the article in its relation to my paper. 

http://web.ebscohost.com/ehost/pdfviewer/pdfviewer?sid=f50015ec-98fd-48e4-981c-8840d31396d3%40sessionmgr10&vid=4&hid=125

Blog Post #6 “The Possibile Effects of Steroids on Home Run Production” – Alan M. Nathan

2 Mar

Professor Alan Nathan of the University provides an interesting summary of the results of the paper “On the potential of a chemical Bonds:  Possible effects of steroids on home run production in baseball” by Robert Tobin. In this paper professor Tobin uses physics to compares the homerun production of the elite home run hitters of pre-steroid baseball to that of the elite home run hitters in the steroid era (1994-2003). The main statistic that he looks at is HRBiP, or Home Runs per Balls in play. Dr. Tobin found that the HRBiP for the elite homerun hitter in the live ball era was .10 while the HRBiP for the hitters in the steroid era was .15 of the balls in play.These results show that for the hitters in the steroid era, there was a 50% increase in homeruns per balls that were put in play. This is a dramatic increase and one that Tobin attributes to an increase in body mass.  Professor Tobin theorized that with a 10% increase in muscle mass there will be about a 3.8% increase in bat speed.  Bat speed is the main indicator of HRBiP in these studies. Nathan in his review of Tobin’s studies agreed that a 10% increase in muscle mass could lead to a 30-70% increase in HRBiP.

This study gave me a few good ideas about what I should include in my regression. Originally I was planning on using full league data to test for the changes in home run numbers from the live ball era to the steroid era to the post steroid era. Unfortunately I found that the variables that I was looking at weren’t the strongest predictors of homeruns. I originally wanted to look at the mean weather during the season and the age of the players in question. I think that while age is definitely still a variable, after reading this article I think that I should also test for the size of the ball players. While I doubt that there is a data set for Muscle Mass or BMI of ball players, I think that weight would be a good indicator of muscle mass. Muscle weighs more than fat, and I think that the correlation between the size of ball players and their bat speed will better show the total of homeruns. Also, I am sure that temperature may actually be a better predictor of homerun totals than weather. Usually if there is poor weather during the season the games will get rained out, and therefore there will be little to no effect on homeruns totals. Therefore precipitation may not be the best variable, however temperature is. When it gets hotter HRBiP goes up, and therefore if there are particularly hot summers it may have an effect on home run totals. This article did a good job of focusing my regression equation. While I may have to get new data sets I am encouraged by the prospects of my topic.

http://webusers.npl.illinois.edu/~a-nathan/pob/BRJ-Steroids-v3.pdf