Intro to Multilevel Model: Lecture 5 Example Code

Likelihood Ratio Test

Read the data set and run two models which are different in only the fixed effect of goldcolor. The restricted model, out1r, does not have the fixed effect of goldcolor and the full model, out1f, have the fixed effect of goldcolor.

dat1 <-read.table("lecture4ex1.csv", sep=",", header=TRUE)
library(lme4)

## Warning: package 'lme4' was built under R version 4.2.3

## Loading required package: Matrix

## Warning: package 'Matrix' was built under R version 4.2.3

out1r <- lmer(consume ~ 1 + I(length - 15) + (1|groupid), data=dat1, REML=FALSE)
out1f <- lmer(consume ~ 1 + I(length - 15) + goldcolor + (1|groupid), data=dat1, REML=FALSE)

To run the likelihood ratio test, the anova function is used. If significant, the full model is preferred. If not significant, the restricted model is preferred.

anova(out1r, out1f)

## Data: dat1
## Models:
## out1r: consume ~ 1 + I(length - 15) + (1 | groupid)
## out1f: consume ~ 1 + I(length - 15) + goldcolor + (1 | groupid)
##       npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## out1r    4 4131.8 4150.6 -2061.9   4123.8                       
## out1f    5 4127.7 4151.2 -2058.8   4117.7 6.1466  1    0.01317 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Random Slope Models

Example 1: Goldfish Food Consumption within Fish Tanks

Let’s start with the random analysis of covariance model (which is a submodel of the random intercept model). length and goldcolor are both goldfish-level predictors. In this model, the effects of both length and goldcolor are fixed across tanks. Note that control = lmerControl(optimizer ="Nelder_Mead") is used here to help the analysis to get a convergent result.

library(lme4)
out1m0 <- lmer(consume ~ 1 + I(length - 15) + goldcolor + (1|groupid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))

Next, the effect of length is varied across tanks. (1 + I(length - 15)|groupid) means that both 1 and I(length - 15) effects are varied across groupid. 1 means that the intercepts are varied across groupid and I(length - 15) means that the slopes of this variable is varied across groupid. The anova function is used to run likelihood ratio test to check whether allowing random slopes is significant.

out1m0a <- lmer(consume ~ 1 + I(length - 15) + goldcolor + (1 + I(length - 15)|groupid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(out1m0, out1m0a)

## Data: dat1
## Models:
## out1m0: consume ~ 1 + I(length - 15) + goldcolor + (1 | groupid)
## out1m0a: consume ~ 1 + I(length - 15) + goldcolor + (1 + I(length - 15) | groupid)
##         npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## out1m0     5 4127.7 4151.2 -2058.8   4117.7                         
## out1m0a    7 4109.4 4142.3 -2047.7   4095.4 22.299  2  1.438e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Next, the effect of goldcolor is varied across tanks. The anova function is used to run likelihood ratio test to check whether allowing random slopes is significant.

out1m0b <- lmer(consume ~ 1 + I(length - 15) + goldcolor + (1 + goldcolor|groupid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(out1m0, out1m0b)

## Data: dat1
## Models:
## out1m0: consume ~ 1 + I(length - 15) + goldcolor + (1 | groupid)
## out1m0b: consume ~ 1 + I(length - 15) + goldcolor + (1 + goldcolor | groupid)
##         npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## out1m0     5 4127.7 4151.2 -2058.8   4117.7                     
## out1m0b    7 4131.1 4164.0 -2058.6   4117.1 0.5479  2     0.7604

The random slope of length is significantly better than the fixed slope of length. However, the random slope of goldcolor was not significant. Thus, the model with only random slope of length is used.

summary(out1m0a)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: consume ~ 1 + I(length - 15) + goldcolor + (1 + I(length - 15) |  
##     groupid)
##    Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
## 
##      AIC      BIC   logLik deviance df.resid 
##   4109.4   4142.3  -2047.7   4095.4      804 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2693 -0.6408  0.0377  0.6230  2.8841 
## 
## Random effects:
##  Groups   Name           Variance Std.Dev. Corr
##  groupid  (Intercept)    19.36131 4.4001       
##           I(length - 15)  0.01636 0.1279   0.67
##  Residual                 5.97388 2.4442       
## Number of obs: 811, groups:  groupid, 100
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)    39.28800    0.46021  85.370
## I(length - 15)  0.87494    0.02321  37.694
## goldcolor      -0.46422    0.18375  -2.526
## 
## Correlation of Fixed Effects:
##             (Intr) I(-15)
## I(lngth-15)  0.394       
## goldcolor   -0.203  0.026

Check the scatterplot between random effects in this model by the ranef and plot functions. The random intercepts (on Y axis) is positively correlated with the random slopes of length (on X axis). That is, the stronger the effect of length in a tank, the more food fish eats in that tank.

ranef1m0a <- ranef(out1m0a)
plot(ranef1m0a)

## $groupid

Example 2: Interviewees’ Rating Scores within Interviewers

Again, the random ANCOVA model with two interviewee-level predictors is used as the baseline model to compare with the random slope models. Note that the iq is scaled because the model with original-scaled iq was not convergent.

dat2 <-read.table("lecture4ex2.csv", sep=",", header=TRUE)
out2m0 <- lmer(score ~ 1 + eesex + I(scale(iq)) + (1|erid), data=dat2, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))

Next, the effect of interviewees’ sex, eesex, is random. The anova function is used to run likelihood ratio test to check whether allowing random slopes is significant.

out2m0a <- lmer(score ~ 1 + eesex + I(scale(iq)) + (1 + eesex|erid), data=dat2, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(out2m0, out2m0a)

## Data: dat2
## Models:
## out2m0: score ~ 1 + eesex + I(scale(iq)) + (1 | erid)
## out2m0a: score ~ 1 + eesex + I(scale(iq)) + (1 + eesex | erid)
##         npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)   
## out2m0     5 61676 61712 -30833    61666                        
## out2m0a    7 61669 61719 -30828    61655 11.183  2    0.00373 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Next, the effect of interviewees’ IQ, iq, is random. The anova function is used to run likelihood ratio test to check whether allowing random slopes is significant.

out2m0b <- lmer(score ~ 1 + eesex + I(scale(iq)) + (1 + I(scale(iq))|erid), data=dat2, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(out2m0, out2m0b)

## Data: dat2
## Models:
## out2m0: score ~ 1 + eesex + I(scale(iq)) + (1 | erid)
## out2m0b: score ~ 1 + eesex + I(scale(iq)) + (1 + I(scale(iq)) | erid)
##         npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)
## out2m0     5 61676 61712 -30833    61666                     
## out2m0b    7 61680 61730 -30833    61666 0.6003  2     0.7407

The random slope of interviewees’ sex was significant so this random slope is retained. Before checking the results by the summary function, analyzing iq without random slopes does not require the standard scores of iq and the control argument. Thus, the original score of iq centered at 100 is used and the control argument is removed.

out2m0a <- lmer(score ~ 1 + eesex + I(iq - 100) + (1 + eesex|erid), data=dat2, REML=FALSE)
summary(out2m0a)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: score ~ 1 + eesex + I(iq - 100) + (1 + eesex | erid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  61669.0  61719.4 -30827.5  61655.0     9993 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6718 -0.6331 -0.0005  0.6408  3.3843 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  erid     (Intercept) 20.115   4.485        
##           eesex        1.086   1.042    0.11
##  Residual             21.765   4.665        
## Number of obs: 10000, groups:  erid, 1000
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) 75.543630   0.156424 482.943
## eesex        0.123562   0.098957   1.249
## I(iq - 100)  0.029552   0.003304   8.945
## 
## Correlation of Fixed Effects:
##             (Intr) eesex 
## eesex       -0.249       
## I(iq - 100) -0.002 -0.007

Plot the scatterplot between random intercepts and random slopes.

ranef2m0a <- ranef(out2m0a)
plot(ranef2m0a)

## $erid

From the random slope model, the interviewer-level predictor can be added to predict the ratings. This model is similar to the random intercept model from the previous chapter with the random slope of the effect of interviewees’ sex.

out2m0c <- lmer(score ~ 1 + eesex + I(iq - 100) + ersex + (1 + eesex|erid), data=dat2, REML=FALSE)
summary(out2m0c)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: score ~ 1 + eesex + I(iq - 100) + ersex + (1 + eesex | erid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  61668.9  61726.5 -30826.4  61652.9     9992 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6647 -0.6348 -0.0018  0.6427  3.3811 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  erid     (Intercept) 20.163   4.490        
##           eesex        1.086   1.042    0.08
##  Residual             21.765   4.665        
## Number of obs: 10000, groups:  erid, 1000
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) 75.321033   0.217765 345.883
## eesex        0.123560   0.098956   1.249
## I(iq - 100)  0.029563   0.003304   8.948
## ersex        0.445192   0.302694   1.471
## 
## Correlation of Fixed Effects:
##             (Intr) eesex  I(-100
## eesex       -0.184              
## I(iq - 100) -0.002 -0.007       
## ersex       -0.695  0.000  0.000

Example 3: Customers Satisfaction within Each Table in Dining Services

The random ANCOVA model is used as the baseline model. Two customer-level predictors are used to test the random slope: age and sex.

dat3 <-read.table("lecture4ex3.csv", sep=",", header=TRUE)
out3m0 <- lmer(sat ~ 1 + I(age - 40) + female + (1|tableid), data=dat3, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))

Check whether the age effects on customer satisfaction are different across tables.

out3m0a <- lmer(sat ~ 1 + I(age - 40) + female + (1 + I(age - 40)|tableid), data=dat3, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(out3m0, out3m0a)

## Data: dat3
## Models:
## out3m0: sat ~ 1 + I(age - 40) + female + (1 | tableid)
## out3m0a: sat ~ 1 + I(age - 40) + female + (1 + I(age - 40) | tableid)
##         npar   AIC   BIC  logLik deviance  Chisq Df Pr(>Chisq)
## out3m0     5 16553 16582 -8271.5    16543                     
## out3m0a    7 16555 16595 -8270.3    16541 2.3274  2     0.3123

Check whether the sex effects on customer satisfaction are different across tables.

out3m0b <- lmer(sat ~ 1 + I(age - 40) + female + (1 + female|tableid), data=dat3, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(out3m0, out3m0b)

## Data: dat3
## Models:
## out3m0: sat ~ 1 + I(age - 40) + female + (1 | tableid)
## out3m0b: sat ~ 1 + I(age - 40) + female + (1 + female | tableid)
##         npar   AIC   BIC  logLik deviance  Chisq Df Pr(>Chisq)
## out3m0     5 16553 16582 -8271.5    16543                     
## out3m0b    7 16554 16594 -8269.9    16540 3.1026  2      0.212