R squared in Group Mean Centering

Example 1: The Effect of Math Achievement on Perceived Self-Efficacy

Read the data set from Lecture 7.

dat4 <- read.table("lecture7ex1.csv", sep=",", header=TRUE)

Calculate group means of math achievement by the ave function and get the group-mean-centered math achievement. The group means of math achievement is centered at 50. The centering at 50 is not implemented by I() but by subtracting the original group means and saving it as a new variable called aveach50. The reason of making a new variable because I() is not suppored in the r2mlm package which will be used to calculate \(R^2\).

dat4$aveach <- ave(dat4$ach, dat4$schoolid)
dat4$diffach <- dat4$ach - dat4$aveach
dat4$aveach50 <- dat4$aveach - 50

First, let’s run the cross-level interaction model where the within-school predictors are the deviation of math achievement within schools, diffach, and the cross-level interaction, diffach:aveach50. The between-school predictors are the school means of math achievement centered at 50, aveach50, and whether a school is private, private.

library(lme4)
out4m1 <- lmer(efficacy ~ 1 + diffach + aveach50 + private
               + diffach:aveach50
               + (1 + diffach|schoolid), data=dat4, REML=FALSE)
summary(out4m1)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: efficacy ~ 1 + diffach + aveach50 + private + diffach:aveach50 +  
##     (1 + diffach | schoolid)
##    Data: dat4
## 
##      AIC      BIC   logLik deviance df.resid 
##   8888.7   8939.0  -4435.3   8870.7     1982 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1594 -0.6500  0.0071  0.6654  3.2921 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  schoolid (Intercept) 1.53574  1.2392        
##           diffach     0.02968  0.1723   -0.22
##  Residual             4.34903  2.0854        
## Number of obs: 1991, groups:  schoolid, 50
## 
## Fixed effects:
##                   Estimate Std. Error t value
## (Intercept)      35.602009   0.325448 109.394
## diffach           0.061923   0.027110   2.284
## aveach50          0.048671   0.021855   2.227
## private          -0.181827   0.420004  -0.433
## diffach:aveach50  0.009505   0.002548   3.730
## 
## Correlation of Fixed Effects:
##             (Intr) diffch avch50 privat
## diffach     -0.127                     
## aveach50    -0.622  0.071              
## private     -0.794  0.000  0.525       
## dffch:vch50  0.051 -0.397 -0.179  0.000

Run the r2mlm function to get the proportions of variance explained.

library(r2mlm)
r2mlm(out4m1)

## $Decompositions
##                      total    within   between
## fixed, within   0.17176369 0.2066938        NA
## fixed, between  0.02635712        NA 0.1559644
## slope variation 0.25530900 0.3072290        NA
## mean variation  0.14263735        NA 0.8440356
## sigma2          0.40393284 0.4860772        NA
## 
## $R2s
##          total    within   between
## f1  0.17176369 0.2066938        NA
## f2  0.02635712        NA 0.1559644
## v   0.25530900 0.3072290        NA
## m   0.14263735        NA 0.8440356
## f   0.19812081        NA        NA
## fv  0.45342981 0.5139228        NA
## fvm 0.59606716        NA        NA

R squared when a Predictor is not Centered

Example 2: Interviewees’ Rating Scores within Interviewers

Read the data set from Lecture 4.

dat5 <- read.table("lecture4ex2.csv", sep=",", header=TRUE)

First, the cross-level interaction model will be used to predict interviews’ ratings. The interviewee-level predictors are interviewee’s sex which male is the reference group, eesex, interviewee’s IQ centered at 100, and the cross-level interaction between interviewee’s and interviewer’s sex. The interviewer-level predictor is interviewer’s sex. However, the r2mlm_long_manual function in the r2mlm package is not easy to use as the r2mlm function. Users need to provide many arguments and the variable names cannot have :, *, or I(). Thus, the interaction or centering must be calcuated as new variables before running the model. Thus, the centered IQ and the interaction are calculated.

dat5$iqc <- dat5$iq - 100
dat5$int <- dat5$eesex * dat5$ersex

Next, analyze the cross-level interaction model.

out5 <- lmer(score ~ 1 + eesex + iqc + ersex + int 
                + (1 + eesex|erid), data=dat5, REML=FALSE)
summary(out5)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: score ~ 1 + eesex + iqc + ersex + int + (1 + eesex | erid)
##    Data: dat5
## 
##      AIC      BIC   logLik deviance df.resid 
##  61641.0  61705.9 -30811.5  61623.0     9991 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7413 -0.6367 -0.0048  0.6410  3.4320 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  erid     (Intercept) 20.1152  4.4850       
##           eesex        0.7979  0.8933   0.12
##  Residual             21.7645  4.6652       
## Number of obs: 10000, groups:  erid, 1000
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) 75.538700   0.221217 341.469
## eesex       -0.413234   0.137868  -2.997
## iqc          0.029418   0.003299   8.918
## ersex        0.009887   0.312847   0.032
## int          1.073648   0.194981   5.506
## 
## Correlation of Fixed Effects:
##       (Intr) eesex  iqc    ersex 
## eesex -0.253                     
## iqc   -0.003  0.001              
## ersex -0.707  0.179  0.002       
## int    0.179 -0.707 -0.008 -0.253

Run the r2mlm_long_manual function to get the proportions of variance explained. Note that, covs is the list of predictors with fixed effects. random_covs is the list of predictors that have random slopes. gammas is the point estimates of the fixed effect. Tau is the covariance matrix of the random effects. sigma2 is the level-1 residual variance.

sumout5 <- summary(out5)
r2mlm_long_manual(data=dat5,
                  covs=c("eesex", "iqc", "ersex", "int"),
                  random_covs=c("eesex"),
                  gammas=coef(sumout5)[-1, "Estimate"],
                  clusterID="erid",
                  Tau=as.matrix(Matrix::bdiag(VarCorr(out5))),
                  sigma2=getME(out5, "sigma")^2)

## $Decompositions
##                                     total      within     between
## fixed slopes (within)         0.005822049 0.011301637          NA
## fixed slopes (between)        0.002170825          NA 0.004477321
## slope variation (within)      0.004626392 0.008980654          NA
## slope variation (between)     0.000000000          NA 0.000000000
## intercept variation (between) 0.482678196          NA 0.995522679
## residual (within)             0.504702537 0.979717710          NA
## 
## $R2s
##           total      within     between
## f1  0.005822049 0.011301637          NA
## f2  0.002170825          NA 0.004477321
## v1  0.004626392 0.008980654          NA
## v2  0.000000000          NA 0.000000000
## m   0.482678196          NA 0.995522679
## f   0.007992874          NA          NA
## fv  0.012619266 0.020282290 0.004477321
## fvm 0.495297463          NA          NA

Changes in R squared

Example 1: The Effect of Math Achievement on Perceived Self-Efficacy

Let’s check the increase in R squares when math achievement is added to predict self-efficacy in addition to private. First, make the likelihood ratio test to compare the models with and without math achievement.

out4m1 <- lmer(efficacy ~ 1 + diffach + aveach50 + private
               + diffach:aveach50
               + (1 + diffach|schoolid), data=dat4, REML=FALSE)
out4m2 <- lmer(efficacy ~ 1 + private + (1|schoolid), data=dat4, REML=FALSE)
anova(out4m2, out4m1)
## Data: dat4
## Models:
## out4m2: efficacy ~ 1 + private + (1 | schoolid)
## out4m1: efficacy ~ 1 + diffach + aveach50 + private + diffach:aveach50 + (1 + diffach | schoolid)
##        npar     AIC   BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## out4m2    4 10096.3 10119 -5044.2  10088.3                         
## out4m1    9  8888.7  8939 -4435.3   8870.7 1217.6  5  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Use the r2mlm_comp function to check the R squared changes when math achievement is added.

r2mlm_comp(out4m2, out4m1)

## $`Model A R2s`
##           total within    between
## f1  0.000000000      0         NA
## f2  0.007708657     NA 0.04747128
## v   0.000000000      0         NA
## m   0.154677036     NA 0.95252872
## f   0.007708657     NA         NA
## fv  0.007708657      0         NA
## fvm 0.162385693     NA         NA
## 
## $`Model B R2s`
##          total    within   between
## f1  0.17176369 0.2066938        NA
## f2  0.02635712        NA 0.1559644
## v   0.25530900 0.3072290        NA
## m   0.14263735        NA 0.8440356
## f   0.19812081        NA        NA
## fv  0.45342981 0.5139228        NA
## fvm 0.59606716        NA        NA
## 
## $`R2 differences, Model B - Model A`
##           total    within    between
## f1   0.17176369 0.2066938         NA
## f2   0.01864846        NA  0.1084931
## v    0.25530900 0.3072290         NA
## m   -0.01203969        NA -0.1084931
## f    0.19041216        NA         NA
## fv   0.44572115 0.5139228         NA
## fvm  0.43368146        NA         NA

Standardized Regression Coefficients

Example 3: Goldfish Food Consumption within Fish Tanks

Read the data set from Lecture 4.

dat1 <- read.table("lecture4ex1.csv", sep=",", header=TRUE)

To find standardized coefficients, all variables need to be standardized before running multilevel models. First, the lower-level variables are easily standardized by the scale function.

dat1$zconsume <- scale(dat1$consume)
dat1$zlength <- scale(dat1$length)

However, the upper-level variables are not easy to standardized. Each row in a data frame represents the lower-level observation. Thus, each upper-level unit will have duplicated values. The duplicated values must be removed. Then, the means and standard deviations are calculated. Finally, the original scores (with duplicated values) are simply subtracted by the calculated means and divided by the obtained standard deviations.

volume <- dat1[!duplicated(dat1$groupid), "volume"]
mvolume <- mean(volume)
sdvolume <- sd(volume)
dat1$zvolume <- (dat1$volume - mvolume) / sd(volume)

Use the standard scores to find group means and group-mean-centered standard scores.

dat1$avezlength <- ave(dat1$zlength, dat1$groupid)
dat1$diffzlength <- dat1$zlength - dat1$avezlength

Run the multilevel models with standard scores.

out1z23 <- lmer(zconsume ~ 1 + diffzlength + avezlength + zvolume 
                + diffzlength:zvolume + (1 + diffzlength|groupid), 
                data=dat1, REML=FALSE)
summary(out1z23)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: 
## zconsume ~ 1 + diffzlength + avezlength + zvolume + diffzlength:zvolume +  
##     (1 + diffzlength | groupid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##    963.2   1005.5   -472.6    945.2      802 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.14847 -0.65742  0.02343  0.65483  2.93474 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  groupid  (Intercept) 0.319747 0.56546      
##           diffzlength 0.006452 0.08033  0.58
##  Residual             0.126301 0.35539      
## Number of obs: 811, groups:  groupid, 100
## 
## Fixed effects:
##                      Estimate Std. Error t value
## (Intercept)         -0.003562   0.058226  -0.061
## diffzlength          0.626756   0.016704  37.521
## avezlength           0.797201   0.130465   6.110
## zvolume              0.232775   0.062356   3.733
## diffzlength:zvolume  0.038541   0.016110   2.392
## 
## Correlation of Fixed Effects:
##             (Intr) dffzln avzlng zvolum
## diffzlength  0.274                     
## avezlength   0.068 -0.001              
## zvolume     -0.037 -0.001 -0.354       
## dffzlngth:z -0.003 -0.225 -0.023  0.277

To check whether the models with standardized and unstandardized variables are the same, log likelihood can be calculated. These models are compared:

  1. Standardized outcomes and standardized predictors (calculated above as out1z23)
  2. Standardized outcomes and unstandardized predictors (out1bz23)
  3. Unstandardized outcomes and standardized predictors (out1zb23)
  4. Unstandardized outcomes and unstandardized predictors (out1b23)

As shown below, Model 1 and Model 2, which both have standardized outcomes, have the same log likelihood values. Model 3 and Model 4, which both have unstandardized outcomes, have the same log likelihood values.

dat1$lengthx <- dat1$length/10
dat1$avelengthx <- ave(dat1$lengthx, dat1$groupid)
dat1$difflengthx <- dat1$lengthx - dat1$avelengthx
dat1$avegold <- ave(dat1$goldcolor, dat1$groupid)
dat1$diffgold <- dat1$goldcolor - dat1$avegold
out1zb23 <- lmer(consume ~ 1 + diffzlength + avezlength + zvolume 
                 + diffzlength:zvolume + (1 + diffzlength|groupid), 
                 data=dat1, REML=FALSE)
out1bz23 <- lmer(zconsume ~ 1 + difflengthx + avelengthx + volume 
                 + difflengthx:volume + (1 + difflengthx|groupid), 
                 data=dat1, REML=FALSE)
out1b23 <- lmer(consume ~ 1 + difflengthx + avelengthx + volume 
                + difflengthx:volume + (1 + difflengthx|groupid), 
                data=dat1, REML=FALSE)
logLik(out1z23)
## 'log Lik.' -472.61 (df=9)
logLik(out1bz23)
## 'log Lik.' -472.61 (df=9)
logLik(out1zb23)
## 'log Lik.' -2041.053 (df=9)
logLik(out1b23)
## 'log Lik.' -2041.053 (df=9)

Instead of standardized regression coefficients, r2mlm_comp can be used to find unique contributions of each predictor. This concept is the same as part regression in regular multiple regression coefficients.

out1b23_1 <- lmer(consume ~ 1 + avelengthx + volume 
                  + (1|groupid), 
                  data=dat1, REML=FALSE)
out1b23_2 <- lmer(consume ~ 1 + difflengthx + volume 
                  + difflengthx:volume + (1 + difflengthx|groupid), 
                  data=dat1, REML=FALSE)
out1b23_3 <- lmer(consume ~ 1 + difflengthx + avelengthx 
                  + difflengthx + (1 + difflengthx|groupid), 
                  data=dat1, REML=FALSE)
r2mlm_comp(out1b23_1, out1b23, bargraph = FALSE)
## $`Model A R2s`
##         total within   between
## f1  0.0000000      0        NA
## f2  0.2288923     NA 0.4678728
## v   0.0000000      0        NA
## m   0.2603268     NA 0.5321272
## f   0.2288923     NA        NA
## fv  0.2288923      0        NA
## fvm 0.4892191     NA        NA
## 
## $`Model B R2s`
##           total     within   between
## f1  0.329200078 0.71831315        NA
## f2  0.228033130         NA 0.4209553
## v   0.005193286 0.01133173        NA
## m   0.313670822         NA 0.5790447
## f   0.557233208         NA        NA
## fv  0.562426494 0.72964488        NA
## fvm 0.876097316         NA        NA
## 
## $`R2 differences, Model B - Model A`
##             total     within     between
## f1   0.3292000781 0.71831315          NA
## f2  -0.0008592057         NA -0.04691757
## v    0.0051932859 0.01133173          NA
## m    0.0533440414         NA  0.04691757
## f    0.3283408725         NA          NA
## fv   0.3335341584 0.72964488          NA
## fvm  0.3868781998         NA          NA
r2mlm_comp(out1b23_2, out1b23, bargraph = FALSE)
## $`Model A R2s`
##           total     within  between
## f1  0.318909555 0.71794802       NA
## f2  0.139199486         NA 0.250447
## v   0.005325128 0.01198824       NA
## m   0.416604632         NA 0.749553
## f   0.458109041         NA       NA
## fv  0.463434169 0.72993626       NA
## fvm 0.880038801         NA       NA
## 
## $`Model B R2s`
##           total     within   between
## f1  0.329200078 0.71831315        NA
## f2  0.228033130         NA 0.4209553
## v   0.005193286 0.01133173        NA
## m   0.313670822         NA 0.5790447
## f   0.557233208         NA        NA
## fv  0.562426494 0.72964488        NA
## fvm 0.876097316         NA        NA
## 
## $`R2 differences, Model B - Model A`
##            total        within    between
## f1   0.010290523  0.0003651276         NA
## f2   0.088833644            NA  0.1705082
## v   -0.000131842 -0.0006565149         NA
## m   -0.102933810            NA -0.1705082
## f    0.099124167            NA         NA
## fv   0.098992325 -0.0002913873         NA
## fvm -0.003941485            NA         NA
r2mlm_comp(out1b23_3, out1b23, bargraph = FALSE)
## $`Model A R2s`
##           total     within   between
## f1  0.330962135 0.71150617        NA
## f2  0.163142469         NA 0.3050288
## v   0.006895454 0.01482393        NA
## m   0.371700444         NA 0.6949712
## f   0.494104605         NA        NA
## fv  0.501000059 0.72633009        NA
## fvm 0.872700503         NA        NA
## 
## $`Model B R2s`
##           total     within   between
## f1  0.329200078 0.71831315        NA
## f2  0.228033130         NA 0.4209553
## v   0.005193286 0.01133173        NA
## m   0.313670822         NA 0.5790447
## f   0.557233208         NA        NA
## fv  0.562426494 0.72964488        NA
## fvm 0.876097316         NA        NA
## 
## $`R2 differences, Model B - Model A`
##            total       within    between
## f1  -0.001762057  0.006806984         NA
## f2   0.064890661           NA  0.1159265
## v   -0.001702168 -0.003492199         NA
## m   -0.058029622           NA -0.1159265
## f    0.063128603           NA         NA
## fv   0.061426435  0.003314785         NA
## fvm  0.003396813           NA         NA