• Home
  • About Me
  • Teaching
  • Research
  • Blog
  • Get in Touch
  Figures and Forecasts by Eric Gaus

Figures and Forecasts

View my profile on LinkedIn

RSS Feed

Visualizing Regressions: What do they do?

8/17/2017

0 Comments

 
In preparing to teach Econometrics this Fall, I found very few online or textbook sources that visually explain what happens in a regression. This post attempts to just that using Galton's height data. This data comes from the paper that provided the world with the term regression. An econometrician might use this data to answer whether a father's height influences the height of his offspring. (The analysis below could just as easily be applied to the mother's height)

I would like to illustrate what happens in a regression using simple scatter plots. This first graph shows the data sorted by height, which is centered around 66.7 inches. 
Picture
A regression attempts to find statistical relationships between variables (like height). One way to think about this is putting the data into different bins to see whether there are any differences. The most obvious bins that jump to mind when thinking about height is gender. The graph below first sorts by gender and then by heights:
Picture
There is a clear difference between male and female heights. Almost half of the male height observations are above the tallest female observation. This should not be surprising, however it does provide the basic insight to how regressions work. The question a regression asks is, "if I observe someone to be a Female, what is their height?" The degree of statistical significance describes how distinct the bins are from one another. That is, the majority of the female observations are below the majority of the male observations. (the average for the women is 64.1 and 69.2)

Now take that basic idea of bins to something like the fathers height. First sort the data in a similar manner, tall vs short fathers (cutoff being 70 in.).
Picture
Here we see a very similar story, but not as stark as the difference between male's and female's heights. That means that the statistical significance of father's height is weaker, because we know that a person with a tall father should on average be taller, but additional variation still exists within these bins. If I increase the refinement of the filter to three bins we can find a break down like this:
Picture
Keep going in this way and we can increase the number of "bins" to include each reported height. This makes a graph that has many bins:
Picture
We can still see the general trend, taller fathers produce taller offspring, however there is a fair amount of variation within the different height bins, so much so that we cannot be very confident about how much paternal height influences offspring height.  Instead, let's look at the classic visualization of regression, the line of best fit:
Picture
At first glance, our question has been answered; a father's height has a marginal influence their offspring. The underlying statistics will tell us that we can only explain 6% of the variation and that a father who could magically be one inch taller would produce offspring approximately 0.389 inches taller. However, we've forgotten the knowledge we gained about the great differences between male and female heights. The graph below visualizes the impact of how father's height influences a child's height conditional ​on whether that child is male or female. ​
Picture
We can see a clear distinction between the males and females. In addition, we are able to explain much more of the variation (approximately 20%) and the influence of a father's height has increased to 0.43 inches per inch. The increase itself is not that important, however the confidence with which we can make that claim has increased (which is related to increase in explanatory power).

This illustrates the science and art of regression analysis. The technique looks for order between observations that exhibit variation. The art comes from understanding when (and what) it is appropriate to include or exclude from a regression. In this case, including an additional variable (gender) helped explain the variation and gave a better answer to our original question.
0 Comments

    Archives

    May 2018
    April 2018
    March 2018
    February 2018
    January 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017

    Categories

    All
    Airlines
    College
    Durable Goods
    Econometrics
    Elections
    Equities
    Euro
    Expectations
    Federal Funds Rate
    Galton
    Gdp Growth
    Gender
    Google Trends
    Housing
    Immagration
    Income
    Inflation
    Labor Market
    My Forecasts
    Nyse
    Options
    Payrolls
    Recession
    Taxes
    Unemployment
    Venture-capital
    Wages
    Wine
    WSJ Survey
    Yield Curves

    RSS Feed

LATEST UPDATE​: 9.7..2017

© COPYRIGHT 2017. ALL RIGHTS RESERVED.
  • Home
  • About Me
  • Teaching
  • Research
  • Blog
  • Get in Touch