Where have all the covariances gone?

Have you wondered ‘Where do the covariances of the predictor variables go in regression?’ or ‘How does regression analysis produce the unique effects of each predictor on the outcome variable?’ Awesome. You’re a normal psychologist asking a really important question.

The simple answer is that the covariances among predictors are accounted for during parameter estimation (which is just a fancy way of saying ‘It just does’, and you can get away with that answer as a psychological researcher). Read on to learn more about this.

Consider a regression with two predictors ({X_{1i}, X_{2i}}) and an outcome variable ({Y_i}), such that

Equation 1: Y_{i} = b_0 + b_1X_{1i} + b_2X_{2i} + \epsilon_{i}

Within the Equation 1, b_0 is the expected value of Y_{i} when X_{1i} and X_{2i} equal zero, b_1 represents the unique effect of X_{1i} on Y_{i}, and b_2 indicates the unique effect of X_{2i} on Y_{i}.

Therefore, when fitting the model in Equation 1, how does regression analysis account for the covariance between X_{1i} and X_{2i}?

I’m going to simplify this blog post by only considering z-score versions of each of the variables. Let

  • {z_{X_{1i}}} = \frac{X_{1i} - \bar{X_{1}}}{s_{X_1}}
  • {z_{X_{2i}}} =\frac{X_{2i} - \bar{X_{2}}}{s_{X_2}}
  • {z_{Y_i}} =\frac{Y_{i} - \bar{Y}}{s_{Y}}

and substitute into Equation 1, then we have

Equation 2:  {z_{Y_i}} = b_1{z_{X_{1i}}} +b_2{z_{X_{2i}}} + \epsilon_{i}

Equation 2 does not include an intercept; when the predictors and outcome variable are  z-scores, the intercept equals 0.

Regression analysis selects the estimates for b_1 and b_2 by minimizing the residual variance (i.e.,  \sigma^2_{\epsilon} ). In particular, regression analysis chooses  the b_1 and b_2 that minimizes \sigma^2_{\epsilon}.

Equation 3: \sigma^2_{\epsilon} =  \frac{1}{N-1}\sum_{i=1}^{N}\epsilon_{i}^2.

We can rewrite the residual variance of Equation 3 in terms of the regression equation from Equation 2 as

Equation 4: \frac{1}{N-1}\sum_{i=1}^{N}\epsilon_{i}^2 =\frac{1}{N-1}\sum_{i=1}^{N}(z_{Y_i} - b_1{z_{X_{1i}}} - b_2{z_{X_{2i}}})^2

Let’s understand this equation a bit further. First, let’s square the summand on the right-hand side of Equation 4.

Equation 5: \frac{1}{N-1}\sum_{i=1}^{N}\epsilon_{i}^2 = \frac{1}{N-1}\sum_{i=1}^{N}z_{Y_i}^2 + b_1^{2}z_{x_{1i}}^2 + b_2^{2}z_{x_{2i}}^2 - 2b_1z_{Y_i}z_{x_{1i}} -2b_2z_{Y_i}z_{x_{2i}} + 2b_1b_2z_{x_{1i}}z_{x_{2i}}

Distributing the summation operator in Equation 5 produces

Equation 6:\frac{1}{N-1}\sum_{i=1}^{N}\epsilon_{i}^2 = \frac{1}{N-1}\sum_{i=1}^{N}y_{i}^2+\frac{1}{N-1}\sum_{i=1}^{N}b_1^2x_{1i}^2 + \frac{1}{N-1}\sum_{i=1}^{N}b_2^2x_{2i}^2 -\frac{1}{N-1}\sum_{i=1}^{N}2b_1y_ix_{1i} -\frac{1}{N-1}\sum_{i=1}^{N}2b_2y_ix_{2i} +\frac{1}{N-1}\sum_{i=1}^{N}2b_1b_2x_{1i}x_{2i}

We can extract out each of the regression coefficients from the summands on the right hand size of equation 6  to obtain

Equation 7:\frac{1}{N-1}\sum_{i=1}^{N}\epsilon_{i}^2 = \frac{1}{N-1}\sum_{i=1}^{N}y_{i}^2+b_1^2\frac{1}{N-1}\sum_{i=1}^{N}x_{1i}^2 +b_2^2\frac{1}{N-1}\sum_{i=1}^{N}x_{2i}^2 -2b_1\frac{1}{N-1}\sum_{i=1}^{N}y_ix_{1i} -2b_2\frac{1}{N-1}\sum_{i=1}^{N}y_ix_{2i} +2b_1b_2\frac{1}{N-1}\sum_{i=1}^{N}x_{1i}x_{2i}

Many of the terms within Equation 7 can be simplified. We know that

  • \frac{1}{N-1}\sum_{i=1}^{N}\epsilon_{i}^2 = \sigma^2_{\epsilon} = 1
  • \frac{1}{N-1}\sum_{i=1}^{N}y_{i}^2 = \sigma^2_Y = 1
  • \frac{1}{N-1}\sum_{i=1}^{N}x_{1i}^2 = \sigma^2_{X_1} = 1
  • \frac{1}{N-1}\sum_{i=1}^{N}x_{2i}^2 = \sigma^2_{z_{X_2}} = 1
  • \frac{1}{N-1}\sum_{i=1}^{N}y_ix_{1i}= \sigma_{Y,X_1}
  • \frac{1}{N-1}\sum_{i=1}^{N}y_ix_{2i}=\sigma_{Y,X_2}
  • \frac{1}{N-1}\sum_{i=1}^{N}x_{1i}x_{2i}= \sigma_{X_1,X_2}

Therefore, we can simplify Equation 7 to

Equation 7: $latex\sigma^2_{\epsilon}  =  \sigma^2_Y +b_1^2(\sigma^2_{X_1}) +b_2^2(\sigma^2_{X_2}) -2b_1\sigma_{Y,X_1}-2b_2\sigma_{Y,X_2} +2b_1b_2\sigma_{X_1,X_2}&bg=ffffff$

Thus, by squaring all of the terms within the regression model (i.e., Equations 4-5), the covariances are among the predictors variables (e.g.,  \sigma_{X_1,X_2}) are directly accounted for.

Note that this is all way, way clearer within an SEM framework****.

*Ok, it was just Simine.

**Mean-centering. No interactions. Mind. Blown.

***So far I’ve just been answering the fake question.

****I smell another blog post….

Posted in Uncategorized

Plotting Intraindividual Variability

I often analyze intensive longitudinal data, or data with more than five repeated measures* for each of many different individuals. The data may be simplified with 3 quantities: an intraindividual mean (iMean), variance (iVar), and covariance (iCov). For example, 30 days of cortisol measures may be quantified in terms of morning level (iMean), linear diurnal slope (iCov), and fluctuation around the slope (iVar). Of course, individuals are all different, and we would like to know if and how the iMeans, iVars, and iCovs relate to each other across individuals.

I designed a new kind of plot that explores the relations between iMeans, iVars and iCovs. After making the plot, I noticed that each individual’s representing looks like a TIE fighter** , complete with cockpit, wings, and flightpath

tie-fighter-plotSome more description about the plot:

  1. The TIE fighter’s cockpit (small circle) represents the iMean, and is read in relation to the vertical y-axis. Cockpits near the bottom of the plot space have low iMeans. Cockpits near the top of the plot space have high iMeans.
  2. The TIE fighter’s wings represent the iVar, and are also read in relation to the vertical y-axis and, for an extra pop, are also mapped to color. Individuals who do not fluctuate very much have smaller wingspans and a deeper yellow coloring. In contrast, individuals who fluctuate a lot have larger wingspans and a lighter, whiter coloring.
  3. The TIE fighter’s flight path represents the iCov, and are read with respect to length (how far the fighter will fly over 1 unit of ttime) and as angles relative to the vertical y-axis (what direction the fighter will fly). Individuals with strong positive iCovs will be flying upward rather quickly, while individuals with strong negative iCovs will be flying downward rather quickly. Individuals with near-zero iCovs will just hover.

With those three elements in place, we can proceed to look for patterns in the relations among the cockpits, wingspans, and flightpaths, and how those features align relative to another individual difference variable – the horizontal x-axis.

First, lets look at the relations among the iMeans, iVars, and iCovs. We can see that the iMeans are related to the iVars. Individuals flying near the bottom of the plot space tend to have larger wingspans than individuals flying near the top of the plot space. Similarly, we can see that the iMeans are related to the iCovs, in that individuals at the bottom of the plot space tend to be speeding downward while individuals near the top of the plot space tend to be hovering level.

Finally, lets look at how the TIE fighters are organized with respect to the x-axis. Cockpits are organized along a diagonal, indicating a strong positive relation between iMeans and the interindividual differences variable (r = .62, p < .01). The fighters on the right generally have smaller wingspans, indicating a negative relation between iVars and the interindividual difference variable (r = -.30, p < .01). Same with the flightpaths; faster downward paths become slower horizontal paths as one moves from the left side to the right side (r = -.36, p < .01).

Altogether, the formation of the TIE fighter squadron portrays the relations among 3 aspects of individual functioning (iMeans, iVars, iCovs) and how they are related to another interindividual differences variable (e.g., individuals’ level of neuroticism, anxiety, or mastery of the force). Let the search for the droids continue!

*Five is a low estimate, but it’s what Bolger and Laurenceau use as their lower bound. And they’re probably right.

**Yep. Star Wars. Please judge me accordingly.

 

Posted in Uncategorized