Explore multilevel data

Read the data set

dat1 <- read.table("lecture4ex1.csv", sep=",", header=TRUE)

Find the descriptive statistics of the variables in the data set

library(psych)
## Warning: package 'psych' was built under R version 4.2.3
describe(dat1)
##           vars   n   mean     sd median trimmed    mad   min    max  range
## id           1 811 406.00 234.26 406.00  406.00 300.97  1.00 811.00 810.00
## groupid      2 811  49.08  28.45  49.00   48.72  37.06  1.00 100.00  99.00
## consume      3 811  38.31   6.92  38.00   38.21   7.41 20.00  60.00  40.00
## goldcolor    4 811   0.51   0.50   1.00    0.52   0.00  0.00   1.00   1.00
## length       5 811  14.07   5.01  14.12   14.10   6.49  3.17  24.33  21.16
## volume       6 811   0.78   0.29   0.73    0.76   0.28  0.04   1.60   1.56
## area         7 811   1.33   0.70   1.09    1.26   0.56  0.13   3.29   3.15
## height       8 811   0.66   0.20   0.67    0.66   0.25  0.30   0.98   0.68
## nfish        9 811   8.99   2.65   9.00    8.89   2.97  2.00  15.00  13.00
##            skew kurtosis   se
## id         0.00    -1.20 8.23
## groupid    0.08    -1.17 1.00
## consume    0.14    -0.32 0.24
## goldcolor -0.05    -2.00 0.02
## length    -0.05    -1.16 0.18
## volume     0.48     0.42 0.01
## area       0.85    -0.09 0.02
## height    -0.07    -1.24 0.01
## nfish      0.27    -0.17 0.09

Check the group means by the aggregate function using FUN=mean. To save spaces, only first ten group means are shown.

outgroupmean <- aggregate(consume ~ groupid, data=dat1, FUN=mean)
head(outgroupmean, 10)
##    groupid  consume
## 1        1 33.00000
## 2        2 30.00000
## 3        3 45.50000
## 4        4 38.50000
## 5        5 35.41667
## 6        6 42.83333
## 7        7 43.12500
## 8        8 31.12500
## 9        9 38.66667
## 10      10 35.45455

Check the ranges of consume within each group by the aggregate function using FUN=mean.

outgrouprange <- aggregate(consume ~ groupid, data=dat1, FUN=range)
head(outgrouprange, 10)
##    groupid consume.1 consume.2
## 1        1        29        38
## 2        2        23        37
## 3        3        39        51
## 4        4        35        44
## 5        5        27        41
## 6        6        35        50
## 7        7        35        50
## 8        8        26        35
## 9        9        33        43
## 10      10        28        43

stripchart is the graph that show data points within each group.

stripchart(consume ~ groupid, vertical = TRUE, data = dat1[dat1$groupid < 10,])

beeswarm is the more beautiful version of stripchart.

library(beeswarm)
dat1_1 <- dat1[dat1$groupid < 10,]
beeswarm(consume ~ groupid, data=dat1_1, col=rainbow(9))

Analyze Null Models

Example 1: Goldfish Food Consumption within Fish Tanks

Two main packages in R are used in multilevel models: lme4 and nlme. Let’s use lme4 package first. The main function for analyzing multilevel regression is lmer. The formula is a little more complicated than regular multiple regression. The predictors are 1 + (1|groupid). The first term, 1, means to estimate the intercept where the second term, (1|groupid), means to estimate group means. RMEL=FALSE means to use full information maximum likelihood.

library(lme4)
## Warning: package 'lme4' was built under R version 4.2.3
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.2.3
out1m0 <- lmer(consume ~ 1 + (1|groupid), data=dat1, REML=FALSE)
summary(out1m0)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: consume ~ 1 + (1 | groupid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##   5120.1   5134.2  -2557.0   5114.1      808 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.52068 -0.71755  0.03949  0.68318  3.05112 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  groupid  (Intercept) 24.04    4.903   
##  Residual             24.63    4.963   
## Number of obs: 811, groups:  groupid, 100
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  38.1562     0.5238   72.84
Example 2: Interviewees’ Rating Scores within Interviewers

Read and examine the data.

dat2 <-read.table("lecture4ex2.csv", sep=",", header=TRUE)
describe(dat2)
##       vars     n    mean      sd  median trimmed     mad   min      max   range
## eeid     1 10000 5000.50 2886.90 5000.50 5000.50 3706.50  1.00 10000.00 9999.00
## erid     2 10000  500.50  288.69  500.50  500.50  370.65  1.00  1000.00  999.00
## score    3 10000   75.61    6.57   76.00   75.64    5.93 50.00   100.00   50.00
## eesex    4 10000    0.50    0.50    0.50    0.50    0.74  0.00     1.00    1.00
## ersex    5 10000    0.50    0.50    0.50    0.50    0.74  0.00     1.00    1.00
## iq       6 10000  100.20   14.88  100.28  100.17   14.88 50.82   157.57  106.76
##        skew kurtosis    se
## eeid   0.00    -1.20 28.87
## erid   0.00    -1.20  2.89
## score -0.05     0.07  0.07
## eesex  0.00    -2.00  0.01
## ersex  0.00    -2.00  0.01
## iq     0.03    -0.04  0.15

Run the null model.

out2m0 <- lmer(score ~ 1 + (1|erid), data=dat2, REML=FALSE)
summary(out2m0)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: score ~ 1 + (1 | erid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  61754.1  61775.8 -30874.1  61748.1     9997 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7662 -0.6329  0.0029  0.6438  3.4706 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  erid     (Intercept) 20.89    4.571   
##  Residual             22.26    4.718   
## Number of obs: 10000, groups:  erid, 1000
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  75.6114     0.1521   497.3
Example 3: Customers Satisfaction within Each Table in Dining Services

Read and examine the data.

dat3 <-read.table("lecture4ex3.csv", sep=",", header=TRUE)
describe(dat3)
##              vars    n    mean     sd median trimmed    mad min  max range
## personid        1 2262 1131.50 653.13 1131.5 1131.50 838.41   1 2262  2261
## tableid         2 2262  249.73 144.15  251.0  250.11 183.84   1  500   499
## sat             3 2262   64.12  11.66   64.0   64.25  11.86  20  100    80
## age             4 2262   36.04  14.23   36.0   35.91  16.31   1   75    74
## female          5 2262    0.51   0.50    1.0    0.52   0.00   0    1     1
## outdoor         6 2262    0.50   0.50    1.0    0.50   0.00   0    1     1
## payperperson    7 2262  325.44  83.30  326.0  325.82  80.06  90  590   500
## numperson       8 2262    5.46   2.02    6.0    5.56   2.97   2    8     6
##               skew kurtosis    se
## personid      0.00    -1.20 13.73
## tableid      -0.01    -1.20  3.03
## sat          -0.10    -0.10  0.25
## age           0.11    -0.20  0.30
## female       -0.05    -2.00  0.01
## outdoor      -0.01    -2.00  0.01
## payperperson -0.01     0.03  1.75
## numperson    -0.21    -1.27  0.04

Run the null model.

out3m0 <- lmer(sat ~ 1 + (1|tableid), data=dat3, REML=FALSE)
summary(out3m0)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: sat ~ 1 + (1 | tableid)
##    Data: dat3
## 
##      AIC      BIC   logLik deviance df.resid 
##  16952.7  16969.8  -8473.3  16946.7     2259 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2436 -0.6240 -0.0043  0.6077  2.8041 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  tableid  (Intercept) 60.34    7.768   
##  Residual             76.05    8.721   
## Number of obs: 2262, groups:  tableid, 500
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  64.2814     0.4005   160.5

Means-as-Outcomes Models

Example 1: Goldfish Food Consumption within Fish Tanks

In this goldfish example, two tank-level predictors are used to predict goldfish’s’ food consumption. nfish is the number of fish within each tank and volume is the tank volume. Technically, the averaged fish consumption within each tank is predicted by both tank-level predictor. Multilevel regression is used rather than aggregated regression which does not account for the reliability of group means. Note that 1 is not included as the predictor in the formula but it is included automatically as the intercept is shown in the output.

out1m1 <- lmer(consume ~ nfish + volume + (1|groupid), data=dat1, REML=FALSE)
summary(out1m1)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: consume ~ nfish + volume + (1 | groupid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##   5073.4   5096.9  -2531.7   5063.4      806 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.61728 -0.69324  0.03748  0.67548  3.03298 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  groupid  (Intercept) 13.20    3.633   
##  Residual             24.62    4.962   
## Number of obs: 811, groups:  groupid, 100
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  35.1096     1.3547  25.917
## nfish        -1.2111     0.2381  -5.087
## volume       18.0981     2.2787   7.942
## 
## Correlation of Fixed Effects:
##        (Intr) nfish 
## nfish  -0.533       
## volume -0.108 -0.760

Both nfish and volume can be centered at their grand means to make the intercept of the model meaningful.

out1m1_1 <- lmer(consume ~ I(nfish - mean(nfish)) + I(volume - mean(volume)) + (1|groupid), data=dat1, REML=FALSE)
summary(out1m1_1)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: consume ~ I(nfish - mean(nfish)) + I(volume - mean(volume)) +  
##     (1 | groupid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##   5073.4   5096.9  -2531.7   5063.4      806 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.61728 -0.69324  0.03748  0.67548  3.03298 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  groupid  (Intercept) 13.20    3.633   
##  Residual             24.62    4.962   
## Number of obs: 811, groups:  groupid, 100
## 
## Fixed effects:
##                          Estimate Std. Error t value
## (Intercept)               38.3144     0.4202  91.187
## I(nfish - mean(nfish))    -1.2111     0.2381  -5.087
## I(volume - mean(volume))  18.0981     2.2787   7.942
## 
## Correlation of Fixed Effects:
##             (Intr) I(-mn(n))
## I(-mn(nfs))  0.163          
## I(-mn(vlm))  0.000 -0.760

To get the exact p of the t values (assuming normal distribution or df = \(\infty\)), the t value must be extracted from the output. Thus, the output from the summary function is saved.

summary1m1_1 <- summary(out1m1_1)
coef1m1_1 <- coef(summary1m1_1)
coef1m1_1
##                           Estimate Std. Error   t value
## (Intercept)              38.314427  0.4201743 91.186985
## I(nfish - mean(nfish))   -1.211056  0.2380587 -5.087214
## I(volume - mean(volume)) 18.098092  2.2786698  7.942393

Next, extract only the t value column.

tvalue1m1_1 <- coef1m1_1[,"t value"]
tvalue1m1_1
##              (Intercept)   I(nfish - mean(nfish)) I(volume - mean(volume)) 
##                91.186985                -5.087214                 7.942393

Finally, calculate the p values using the pnorm function. The result is multiplied by 2 to get the two-tailed p values

pnorm(abs(tvalue1m1_1), lower.tail=FALSE) * 2
##              (Intercept)   I(nfish - mean(nfish)) I(volume - mean(volume)) 
##              0.00000e+00              3.63361e-07              1.98317e-15
Example 2: Interviewees’ Rating Scores within Interviewers

The rating scores are predicted by the interviewers’ sex.

out2m1 <- lmer(score ~ 1 + ersex + (1|erid), data=dat2, REML=FALSE)
summary(out2m1)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: score ~ 1 + ersex + (1 | erid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  61752.9  61781.7 -30872.4  61744.9     9996 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7604 -0.6335  0.0010  0.6456  3.4647 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  erid     (Intercept) 20.82    4.563   
##  Residual             22.26    4.718   
## Number of obs: 10000, groups:  erid, 1000
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  75.3376     0.2147 350.915
## ersex         0.5476     0.3036   1.804
## 
## Correlation of Fixed Effects:
##       (Intr)
## ersex -0.707
Example 3: Customers Satisfaction within Each Table in Dining Services

The customer satisfaction scores are predicted by the number of persons in each table and whether the table is outdoor.

out3m1 <- lmer(sat ~ 1 + I(numperson - 4) + outdoor + (1|tableid), data=dat3, REML=FALSE)
summary(out3m1)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: sat ~ 1 + I(numperson - 4) + outdoor + (1 | tableid)
##    Data: dat3
## 
##      AIC      BIC   logLik deviance df.resid 
##  16943.7  16972.4  -8466.9  16933.7     2257 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2689 -0.6341 -0.0019  0.6189  2.7590 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  tableid  (Intercept) 58.24    7.632   
##  Residual             76.06    8.721   
## Number of obs: 2262, groups:  tableid, 500
## 
## Fixed effects:
##                  Estimate Std. Error t value
## (Intercept)       63.1131     0.5768 109.420
## I(numperson - 4)  -0.2375     0.1920  -1.237
## outdoor            2.6954     0.7904   3.410
## 
## Correlation of Fixed Effects:
##             (Intr) I(n-4)
## I(nmprsn-4) -0.247       
## outdoor     -0.683 -0.008

Random Analysis of Covariance Models

Example 1: Goldfish Food Consumption within Fish Tanks

In this goldfish example, two goldfish-level predictors are used to predict goldfish’s’ food consumption. length is the length of each goldfish and goldcolor is whether a goldfish is gold or not. Multilevel regression is used rather than disaggregated regression which may inflate type I errors. Note that the length is centered at 15 cm.

out1m2 <- lmer(consume ~ I(length - 15) + goldcolor + (1|groupid), data=dat1, REML=FALSE)
summary(out1m2)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: consume ~ I(length - 15) + goldcolor + (1 | groupid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##   4127.7   4151.2  -2058.8   4117.7      806 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2524 -0.6184  0.0284  0.6508  3.1677 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  groupid  (Intercept) 18.644   4.318   
##  Residual              6.363   2.522   
## Number of obs: 811, groups:  groupid, 100
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)    39.34350    0.45266  86.916
## I(length - 15)  0.88283    0.01943  45.442
## goldcolor      -0.46446    0.18697  -2.484
## 
## Correlation of Fixed Effects:
##             (Intr) I(-15)
## I(lngth-15)  0.040       
## goldcolor   -0.211  0.027
Example 2: Interviewees’ Rating Scores within Interviewers

The rating scores are predicted by the interviewees’ sex and IQ. Note that the interviewee’s IQ is centered at 100.

out2m2 <- lmer(score ~ 1 + eesex + I(iq - 100) + (1|erid), data=dat2, REML=FALSE)
summary(out2m2)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: score ~ 1 + eesex + I(iq - 100) + (1 | erid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  61676.1  61712.2 -30833.1  61666.1     9995 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7323 -0.6356 -0.0036  0.6447  3.3714 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  erid     (Intercept) 20.86    4.567   
##  Residual             22.07    4.697   
## Number of obs: 10000, groups:  erid, 1000
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) 75.543619   0.158973 475.199
## eesex        0.123540   0.093952   1.315
## I(iq - 100)  0.029659   0.003307   8.970
## 
## Correlation of Fixed Effects:
##             (Intr) eesex 
## eesex       -0.295       
## I(iq - 100) -0.002 -0.007
Example 3: Customers Satisfaction within Each Table in Dining Services

The customer satisfaction scores are predicted by the customers’ sex and age. Note that age is centered at 40.

out3m2 <- lmer(sat ~ 1 + I(age - 40) + female + (1|tableid), data=dat3, REML=FALSE)
summary(out3m2)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: sat ~ 1 + I(age - 40) + female + (1 | tableid)
##    Data: dat3
## 
##      AIC      BIC   logLik deviance df.resid 
##  16553.0  16581.6  -8271.5  16543.0     2257 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.95679 -0.59088 -0.00305  0.59996  2.94782 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  tableid  (Intercept) 68.43    8.272   
##  Residual             59.69    7.726   
## Number of obs: 2262, groups:  tableid, 500
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 66.44867    0.45496 146.055
## I(age - 40)  0.26085    0.01242  20.994
## female      -2.14516    0.36228  -5.921
## 
## Correlation of Fixed Effects:
##             (Intr) I(-40)
## I(age - 40)  0.140       
## female      -0.417 -0.073

Random Intercepts Models

Example 1: Goldfish Food Consumption within Fish Tanks

Put both two goldfish-level predictors and two tank-level predicts in the model.

out1m3 <- lmer(consume ~ I(nfish - mean(nfish)) + I(volume - mean(volume)) + I(length - 15) + goldcolor + (1|groupid), data=dat1, REML=FALSE)
summary(out1m3)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: consume ~ I(nfish - mean(nfish)) + I(volume - mean(volume)) +  
##     I(length - 15) + goldcolor + (1 | groupid)
##    Data: dat1
## 
##      AIC      BIC   logLik deviance df.resid 
##   4087.8   4120.7  -2036.9   4073.8      804 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2895 -0.6440  0.0324  0.6377  3.1624 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  groupid  (Intercept) 11.768   3.430   
##  Residual              6.358   2.522   
## Number of obs: 811, groups:  groupid, 100
## 
## Fixed effects:
##                          Estimate Std. Error t value
## (Intercept)              39.37890    0.38422 102.491
## I(nfish - mean(nfish))   -1.11583    0.20651  -5.403
## I(volume - mean(volume)) 14.76808    1.99877   7.389
## I(length - 15)            0.87926    0.01939  45.346
## goldcolor                -0.47843    0.18663  -2.563
## 
## Correlation of Fixed Effects:
##             (Intr) I(-mn(n)) I(-mn(v)) I(-15)
## I(-mn(nfs))  0.182                           
## I(-mn(vlm)) -0.001 -0.760                    
## I(lngth-15)  0.040  0.016    -0.042          
## goldcolor   -0.248  0.005    -0.005     0.027
Example 2: Interviewees’ Rating Scores within Interviewers

Put interviewers’ sex, interviewees’ sex, and interviewees’ IQ in the same model.

out2m3 <- lmer(score ~ 1 + ersex + eesex + I(iq - 100) + (1|erid), data=dat2, REML=FALSE)
summary(out2m3)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: score ~ 1 + ersex + eesex + I(iq - 100) + (1 | erid)
##    Data: dat2
## 
##      AIC      BIC   logLik deviance df.resid 
##  61674.9  61718.2 -30831.4  61662.9     9994 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7265 -0.6355 -0.0029  0.6452  3.3655 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  erid     (Intercept) 20.78    4.559   
##  Residual             22.07    4.697   
## Number of obs: 10000, groups:  erid, 1000
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) 75.270267   0.219518 342.888
## ersex        0.546704   0.303254   1.803
## eesex        0.123540   0.093952   1.315
## I(iq - 100)  0.029658   0.003307   8.969
## 
## Correlation of Fixed Effects:
##             (Intr) ersex  eesex 
## ersex       -0.691              
## eesex       -0.214  0.000       
## I(iq - 100) -0.001  0.000 -0.007
Example 3: Customers Satisfaction within Each Table in Dining Services

In addition to put all customer-level and table-level predictors, the age averages within each table and the proportion of females within each table are also used. Let’s start with the age variable. First, the age averages within each group is calculated by the ave function.

dat3$aveage <- ave(dat3$age, dat3$tableid)

Next, both age and aveage are used to predict customer satisfaction. The regression coefficient of age represents the within-table effect. However, the regression coefficient of aveage represent the sum of within-table and between-table effects. Note that both predictors are centered at 40.

out3m3 <- lmer(sat ~ I(age - 40) + I(aveage - 40) + (1|tableid), data=dat3, REML=FALSE)
summary(out3m3)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: sat ~ I(age - 40) + I(aveage - 40) + (1 | tableid)
##    Data: dat3
## 
##      AIC      BIC   logLik deviance df.resid 
##  16562.7  16591.3  -8276.4  16552.7     2257 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1638 -0.6006 -0.0048  0.6144  3.0681 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  tableid  (Intercept) 64.30    8.019   
##  Residual             60.82    7.799   
## Number of obs: 2262, groups:  tableid, 500
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)    64.08614    0.47176 135.845
## I(age - 40)     0.26783    0.01276  20.992
## I(aveage - 40) -0.31818    0.06279  -5.068
## 
## Correlation of Fixed Effects:
##             (Intr) I(g-40)
## I(age - 40)  0.000        
## I(aveag-40)  0.516 -0.203

To calculate the between table effects, the regression coefficients are extracted from the summary output.

sumout3m3 <- summary(out3m3)
coef3m3 <- coef(sumout3m3)[,"Estimate"]
coef3m3
##    (Intercept)    I(age - 40) I(aveage - 40) 
##     64.0861426      0.2678348     -0.3181824

Calculate the between-table effect by summing the last two coefficients.

coef3m3[2] + coef3m3[3]
## I(age - 40) 
## -0.05034763

Next, visualize the within-table and between-table age effects. Because this data set has 500 tables. We will plot only 100 within-table regression lines. So dat3_1 is created to extract only table ID that is divisible by 5.

dat3_1 <- dat3[dat3$tableid%%5 == 0,]

The ggplot function in the ggplot2 package is used to create within-table regression lines. See that aes is used to make subsets of data for each table and geom_smooth is used to create regression lines within each subset of data.

Next, to create the between-table regression line, geom_abline is used to overlap the current plot. Note that btwintcept is added by btwslope*40 because the X-axis of the graph is the original-scaled age not the centered age at 40.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.3
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
out <- ggplot(dat3_1, aes(x=age, y=sat, group=tableid)) + geom_smooth(method=lm, se=FALSE)
btwslope <- coef3m3[2] + coef3m3[3]
btwintcept <- coef3m3[1]+(btwslope*40)
out + geom_abline(intercept=btwintcept, slope=btwslope, color="red")
## `geom_smooth()` using formula = 'y ~ x'

Similar to customers’ age, repeat the same process to ave to get the within- and between-table effects.

dat3$avefemale <- ave(dat3$female, dat3$tableid)
out3m4 <- lmer(sat ~ female + I(avefemale - 0.5) + (1|tableid), data=dat3, REML=FALSE)
summary(out3m4)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: sat ~ female + I(avefemale - 0.5) + (1 | tableid)
##    Data: dat3
## 
##      AIC      BIC   logLik deviance df.resid 
##  16940.3  16968.9  -8465.1  16930.3     2257 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.15838 -0.63360 -0.00586  0.61192  2.75216 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  tableid  (Intercept) 60.12    7.753   
##  Residual             75.45    8.686   
## Number of obs: 2262, groups:  tableid, 500
## 
## Fixed effects:
##                    Estimate Std. Error t value
## (Intercept)         65.0817     0.4513 144.209
## female              -1.5293     0.4175  -3.663
## I(avefemale - 0.5)  -1.1253     1.5879  -0.709
## 
## Correlation of Fixed Effects:
##             (Intr) female
## female      -0.463       
## I(vfml-0.5)  0.078 -0.263

Extract the regression coefficients.

sumout3m4 <- summary(out3m4)
coef3m4 <- coef(sumout3m4)[,"Estimate"]
coef3m4
##        (Intercept)             female I(avefemale - 0.5) 
##          65.081716          -1.529278          -1.125336

Visualize the within-table (blue) and between-table (red) effects of gender on customer satisfaction.

out <- ggplot(dat3_1, aes(x=female, y=sat, group=tableid)) + geom_smooth(method=lm, se=FALSE)
btwslope <- coef3m4[2] + coef3m4[3]
btwintcept <- coef3m4[1]+(btwslope*0.5)
out + geom_abline(intercept=btwintcept, slope=btwslope, color="red")
## `geom_smooth()` using formula = 'y ~ x'

Put all predictors in the model.

out3m5 <- lmer(sat ~ female + I(age - 40) + I(avefemale - 0.5) + I(aveage - 40) + I(numperson - 4) + outdoor + (1|tableid), data=dat3, REML=FALSE)
summary(out3m5)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: sat ~ female + I(age - 40) + I(avefemale - 0.5) + I(aveage -  
##     40) + I(numperson - 4) + outdoor + (1 | tableid)
##    Data: dat3
## 
##      AIC      BIC   logLik deviance df.resid 
##  16522.6  16574.1  -8252.3  16504.6     2253 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.07246 -0.60940  0.00108  0.60756  2.92122 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  tableid  (Intercept) 62.19    7.886   
##  Residual             59.70    7.726   
## Number of obs: 2262, groups:  tableid, 500
## 
## Fixed effects:
##                    Estimate Std. Error t value
## (Intercept)        64.11409    0.64109 100.008
## female             -2.13297    0.37243  -5.727
## I(age - 40)         0.27329    0.01268  21.560
## I(avefemale - 0.5) -0.57699    1.55230  -0.372
## I(aveage - 40)     -0.30041    0.06216  -4.832
## I(numperson - 4)   -0.23366    0.19197  -1.217
## outdoor             2.66961    0.79303   3.366
## 
## Correlation of Fixed Effects:
##             (Intr) female I(g-40) I(-0.5 I(v-40) I(n-4)
## female      -0.290                                     
## I(age - 40)  0.022 -0.075                              
## I(vfml-0.5)  0.032 -0.240  0.018                       
## I(aveag-40)  0.324  0.015 -0.204  -0.025               
## I(nmprsn-4) -0.220  0.000  0.000   0.007 -0.034        
## outdoor     -0.581  0.000  0.000  -0.005  0.089  -0.011