Read the new data set representing a randomized-control experiment into treatment and control groups. Stress are measured weekly for 14 weeks.
dat1 <- read.table("lecture12ex1.csv", sep=",", header=TRUE)
See descriptive statistics of all variables within the data set.
library(psych)
describe(dat1)
## vars n mean sd median trimmed mad min max range skew
## rowid 1 14000 7000.50 4041.60 7000.5 7000.50 5189.10 1 14000 13999 0.00
## pid 2 14000 500.50 288.69 500.5 500.50 370.65 1 1000 999 0.00
## time 3 14000 7.50 4.03 7.5 7.50 5.19 1 14 13 0.00
## stress 4 14000 50.21 9.85 51.0 50.39 10.38 11 83 72 -0.20
## treat 5 14000 0.50 0.50 0.5 0.50 0.74 0 1 1 0.00
## sleep 6 14000 423.58 58.69 423.0 423.53 60.79 226 635 409 0.01
## age 7 14000 36.64 6.20 36.0 36.58 5.93 18 60 42 0.11
## kurtosis se
## rowid -1.20 34.16
## pid -1.20 2.44
## time -1.21 0.03
## stress 0.10 0.08
## treat -2.00 0.00
## sleep -0.13 0.50
## age 0.00 0.05
In this example, the time will be separated into three periods: baseline, treatment, and follow-up. The treatment starts right after measuring Week 5. The treatments are implemented at Week 5, 6, 7, 8, 9, and 10. The measurements at Week 6, 7, 8, 9, 10, and 11 are the direct impacts of the treatment so these measurements are included in the treatment period. Thus, Week 12, 13, and 14 are the pure follow-up period.
Time | Centered.Time | Baseline | Treatment | Follow.up |
---|---|---|---|---|
1 | 0 | 0 | 0 | 0 |
2 | 1 | 1 | 0 | 0 |
3 | 2 | 2 | 0 | 0 |
4 | 3 | 3 | 0 | 0 |
5 | 4 | 4 | 0 | 0 |
6 | 5 | 4 | 1 | 0 |
7 | 6 | 4 | 2 | 0 |
8 | 7 | 4 | 3 | 0 |
9 | 8 | 4 | 4 | 0 |
10 | 9 | 4 | 5 | 0 |
11 | 10 | 4 | 6 | 0 |
12 | 11 | 4 | 6 | 1 |
13 | 12 | 4 | 6 | 2 |
14 | 13 | 4 | 6 | 3 |
dat1$timec <- dat1$time - 1
dat1$timebase <- dat1$timec
dat1[dat1$timebase > 4, "timebase"] <- 4
dat1$timetreat <- dat1$timec - dat1$timebase
dat1[dat1$timetreat > 6, "timetreat"] <- 6
dat1$timefollow <- dat1$timec - dat1$timebase - dat1$timetreat
Check the null model to investigate the intraclass correlation of stress.
library(lme4)
out1a <- lmer(stress ~ 1 + (1|pid), data=dat1, REML=FALSE)
summary(out1a)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: stress ~ 1 + (1 | pid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 103315.8 103338.4 -51654.9 103309.8 13997
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7791 -0.6489 0.0212 0.6755 3.2755
##
## Random effects:
## Groups Name Variance Std.Dev.
## pid (Intercept) 9.076 3.013
## Residual 88.021 9.382
## Number of obs: 14000, groups: pid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 50.2054 0.1239 405
Use timec
to predict job satisfaction without random
slopes.
out1b <- lmer(stress ~ 1 + timec + (1|pid), data=dat1, REML=FALSE)
summary(out1b)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: stress ~ 1 + timec + (1 | pid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 102801.3 102831.5 -51396.6 102793.3 13996
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8198 -0.6463 0.0086 0.6626 3.2683
##
## Random effects:
## Groups Name Variance Std.Dev.
## pid (Intercept) 9.321 3.053
## Residual 84.592 9.197
## Number of obs: 14000, groups: pid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 53.08240 0.17628 301.13
## timec -0.44262 0.01928 -22.95
##
## Correlation of Fixed Effects:
## (Intr)
## timec -0.711
Let’s use three time variables, timebase
,
timetreat
, and timefollow
, to model the
changes in stress.
out1c <- lmer(stress ~ 1 + timebase + timetreat + timefollow + (1|pid), data=dat1, REML=FALSE)
anova(out1b, out1c)
## Data: dat1
## Models:
## out1b: stress ~ 1 + timec + (1 | pid)
## out1c: stress ~ 1 + timebase + timetreat + timefollow + (1 | pid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## out1b 4 102801 102831 -51397 102793
## out1c 6 102693 102738 -51341 102681 112.21 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model with three time variables was significantly better than the model with only one time variable. Check the output of the model with three time variables.
summary(out1c)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: stress ~ 1 + timebase + timetreat + timefollow + (1 | pid)
## Data: dat1
##
## AIC BIC logLik deviance df.resid
## 102693.1 102738.3 -51340.5 102681.1 13994
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8092 -0.6436 0.0062 0.6596 3.3402
##
## Random effects:
## Groups Name Variance Std.Dev.
## pid (Intercept) 9.373 3.062
## Residual 83.865 9.158
## Number of obs: 14000, groups: pid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 52.44186 0.23946 218.998
## timebase -0.01506 0.07770 -0.194
## timetreat -0.87487 0.04711 -18.572
## timefollow 0.58362 0.10854 5.377
##
## Correlation of Fixed Effects:
## (Intr) timebs timtrt
## timebase -0.775
## timetreat 0.191 -0.588
## timefollow -0.053 0.163 -0.574
Let’s investigate the change in stress across times. First, make each line represent the raw stress level of each individual across time.
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
dat1_2 <- dat1[dat1$pid%%20 == 0,]
dat1_2$treat <- factor(dat1_2$treat, labels=c("Control", "Treatment"))
ggplot(data=dat1_2, aes(x=time, y=stress, group=pid, colour=treat)) + geom_line()
Second, in each individual, all three time variables are used to
predict stress levels. Then, the predicted values of stress are saved as
stress2
. Then, the predicted stress force a linear change
within individuals. See the spline change of stress level in each
client.
datl <- split(dat1, dat1$pid)
stresspred <- lapply(datl, function(x) { predict(lm(stress ~ timebase + timetreat + timefollow, data=x)) })
dat1$stress2 <- do.call(c, stresspred)
dat1_2 <- dat1[dat1$pid%%20 == 0,]
dat1_2$treat <- factor(dat1_2$treat, labels=c("Control", "Treatment"))
ggplot(data=dat1_2, aes(x=time, y=stress2, group=pid, colour=treat)) + geom_line()
Next, test the random slopes of each time variable. The model with
the random slope of timebase
was not convergent and the
fixed effects of timebase
was not significant too. Because
timebase
is not the crucial effect of the research,
timebase
is dropped from further analyses.
out1d <- lmer(stress ~ 1 + timebase + timetreat + timefollow + (1 + timebase|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
## boundary (singular) fit: see help('isSingular')
anova(out1c, out1d)
## Data: dat1
## Models:
## out1c: stress ~ 1 + timebase + timetreat + timefollow + (1 | pid)
## out1d: stress ~ 1 + timebase + timetreat + timefollow + (1 + timebase | pid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## out1c 6 102693 102738 -51341 102681
## out1d 8 102422 102483 -51203 102406 274.68 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Next, the random slopes of timetreat
was
significant.
out1e <- lmer(stress ~ 1 + timebase + timetreat + timefollow + (1 + timetreat|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(out1c, out1e)
## Data: dat1
## Models:
## out1c: stress ~ 1 + timebase + timetreat + timefollow + (1 | pid)
## out1e: stress ~ 1 + timebase + timetreat + timefollow + (1 + timetreat | pid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## out1c 6 102693 102738 -51341 102681
## out1e 8 101783 101843 -50884 101767 913.99 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Next, the random slopes of timefollow
was
significant.
out1f <- lmer(stress ~ 1 + timebase + timetreat + timefollow + (1 + timefollow|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(out1c, out1f)
## Data: dat1
## Models:
## out1c: stress ~ 1 + timebase + timetreat + timefollow + (1 | pid)
## out1f: stress ~ 1 + timebase + timetreat + timefollow + (1 + timefollow | pid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## out1c 6 102693 102738 -51341 102681
## out1f 8 102450 102510 -51217 102434 247.44 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In sum, two random effects were significant. Thus, both random
effects are analyzed in the same model and remove timebase
from the model.
out1g <- lmer(stress ~ 1 + timetreat + timefollow + (1 + timetreat+ timefollow|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(out1g)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: stress ~ 1 + timetreat + timefollow + (1 + timetreat + timefollow |
## pid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 101766.7 101842.1 -50873.3 101746.7 13990
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.0056 -0.6302 0.0125 0.6550 3.2871
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 0.3362 0.5799
## timetreat 1.3151 1.1468 0.20
## timefollow 1.8105 1.3455 -0.67 -0.31
## Residual 74.5464 8.6340
## Number of obs: 14000, groups: pid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 52.40588 0.11110 471.680
## timetreat -0.88024 0.05103 -17.248
## timefollow 0.58704 0.10957 5.358
##
## Correlation of Fixed Effects:
## (Intr) timtrt
## timetreat -0.445
## timefollow 0.096 -0.474
Let’s plot the scatterplot matrix of the random effects. The easiest
way is to generate data from the output of the random slope models. See
the Random effects:
section. The standard deviations and
the correlation matrix of the random effects are saved. Next, see the
means of each random effect in the Fixed effects:
section.
From means, standard deviations, and the correlation matrix, data are
simulated by the mvrnorm
function in the MASS
package. Then, the simulated data are used to create scatterplot matrix
using the pairs.panels
in the psych
package.
tempsd <- c(0.5799, 1.1468, 1.3455)
tempr <- matrix(c(1, 0.2, -0.67,
0.2, 1, -0.31,
-0.67, -0.31, 1), 3, 3)
library(lavaan)
tempcov <- cor2cov(tempr, tempsd)
library(MASS)
temp <- mvrnorm(1000, c(52.41, -0.88, 0.587), tempcov)
colnames(temp) <- c("intcept", "timetreat", "timefollow")
library(psych)
pairs.panels(temp, main = "Scatter Plot Matrix of the Random Effects")
Next, the scatterplots of each pair are created.
plot(temp[,1], temp[,2], xlab="Baseline Stress", ylab="Stress Change in Treatment Period")
abline(h = 0, col="red", lty=2)
abline(v = 0, col="red", lty=2)
plot(temp[,1], temp[,3], xlab="Baseline Stress", ylab="Stress Change in Follow-up Period")
abline(h = 0, col="red", lty=2)
abline(v = 0, col="red", lty=2)
plot(temp[,2], temp[,3], xlab="Stress Change in Treatment Period", ylab="Stress Change in Follow-up Period")
abline(h = 0, col="red", lty=2)
abline(v = 0, col="red", lty=2)
Next, all covariates are added in the model. sleephrs
is
the only time-varying covariate. It is separated into individual means
and individual-mean-centered variable.
dat1$sleephrs <- dat1$sleep / 60
dat1$avesleep <- ave(dat1$sleephrs, dat1$pid)
dat1$diffsleep <- dat1$sleephrs - dat1$avesleep
All time-invariant covariates, avesleep
,
treat
, and age
are grand-mean centered. Thus,
the grand means must be calculated by (a) making a new data set where
rows represent each individual and (b) calculating the means across
individuals.
dat1a <- dat1[!duplicated(dat1$pid),]
round(apply(dat1a, 2, mean), 3)
## rowid pid time stress treat sleep age
## 6994.000 500.500 1.000 52.543 0.500 424.507 36.638
## timec timebase timetreat timefollow stress2 sleephrs avesleep
## 0.000 0.000 0.000 0.000 52.442 7.075 7.060
## diffsleep
## 0.015
Next, all covariates are included in the models.
out1h <- lmer(stress ~ 1 + timetreat + timefollow + diffsleep + I(avesleep - 7.075) + I(treat - 0.5) + I(age - 36.638) + (1 + timetreat+ timefollow|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(out1h)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: stress ~ 1 + timetreat + timefollow + diffsleep + I(avesleep -
## 7.075) + I(treat - 0.5) + I(age - 36.638) + (1 + timetreat +
## timefollow | pid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## AIC BIC logLik deviance df.resid
## 101443.5 101549.2 -50707.7 101415.5 13986
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.0920 -0.6324 0.0072 0.6530 3.2911
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 2.446 1.564
## timetreat 1.306 1.143 -0.65
## timefollow 1.713 1.309 0.16 -0.30
## Residual 73.511 8.574
## Number of obs: 14000, groups: pid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 52.41799 0.11955 438.465
## timetreat -0.88473 0.05077 -17.427
## timefollow 0.59457 0.10847 5.481
## diffsleep -1.91626 0.14072 -13.617
## I(avesleep - 7.075) 0.18586 0.12980 1.432
## I(treat - 0.5) -2.82121 0.18266 -15.445
## I(age - 36.638) 0.01326 0.01724 0.769
##
## Correlation of Fixed Effects:
## (Intr) timtrt tmfllw dffslp I(-7.0 I(-0.5
## timetreat -0.625
## timefollow 0.153 -0.472
## diffsleep -0.006 0.006 -0.005
## I(vs-7.075) 0.017 0.000 0.000 0.001
## I(tret-0.5) 0.000 0.000 0.000 0.003 0.003
## I(g-36.638) 0.009 0.000 0.000 0.000 0.520 0.004
Next, all time-invariate covariates are centered at the calculated
grand means. Also, all cross-level interactions are added to predict the
random slopes of time
.
out1i <- lmer(stress ~ 1 + timetreat + timefollow + diffsleep + timetreat*I(avesleep - 7.075) + timetreat*I(treat - 0.5) + timetreat*I(age - 36.638) + timefollow*I(avesleep - 7.075) + timefollow*I(treat - 0.5) + timefollow*I(age - 36.638) + (1 + timetreat+ timefollow|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead", calc.derivs = FALSE))
summary(out1i)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: stress ~ 1 + timetreat + timefollow + diffsleep + timetreat *
## I(avesleep - 7.075) + timetreat * I(treat - 0.5) + timetreat *
## I(age - 36.638) + timefollow * I(avesleep - 7.075) + timefollow *
## I(treat - 0.5) + timefollow * I(age - 36.638) + (1 + timetreat +
## timefollow | pid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead", calc.derivs = FALSE)
##
## AIC BIC logLik deviance df.resid
## 101023.0 101173.9 -50491.5 100983.0 13980
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.0674 -0.6351 0.0053 0.6621 3.3091
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 0.3553 0.5960
## timetreat 0.4344 0.6591 0.46
## timefollow 1.3450 1.1598 -0.79 0.14
## Residual 73.5184 8.5743
## Number of obs: 14000, groups: pid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 52.418891 0.110471 474.503
## timetreat -0.885219 0.041316 -21.426
## timefollow 0.593959 0.106793 5.562
## diffsleep -1.892950 0.139462 -13.573
## I(avesleep - 7.075) 0.251976 0.156970 1.605
## I(treat - 0.5) 0.059906 0.220894 0.271
## I(age - 36.638) 0.013452 0.020844 0.645
## timetreat:I(avesleep - 7.075) -0.035437 0.058707 -0.604
## timetreat:I(treat - 0.5) -1.864007 0.082616 -22.562
## timetreat:I(age - 36.638) 0.002091 0.007796 0.268
## timefollow:I(avesleep - 7.075) -0.033808 0.151753 -0.223
## timefollow:I(treat - 0.5) 1.188626 0.213551 5.566
## timefollow:I(age - 36.638) -0.018400 0.020154 -0.913
##
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
Plot the expected changes in the stress level in both treatment and
control groups. First, the data from Individual 1 are selected to get
the values of time
, timetreat
, and
timefollow
at each time point. Next, the regression
equation of expect means is obtained from the fixed effects results
involving treat
, timetreat
, and
timefollow
. Next, the expected means of each treatment
condition and each time point are calculated and plotted.
dat1g1 <- dat1[dat1$pid == 1,]
w <- 1
ytreat <- 52.42 + 0.06*(w - 0.5) - 0.88*dat1g1$timetreat - 1.864*(w-0.5)*dat1g1$timetreat + 0.59*dat1g1$timefollow + 1.189*(w-0.5)*dat1g1$timefollow
w <- 0
yctrl <- 52.42 + 0.06*(w - 0.5) - 0.88*dat1g1$timetreat - 1.864*(w-0.5)*dat1g1$timetreat + 0.59*dat1g1$timefollow + 1.189*(w-0.5)*dat1g1$timefollow
plot(dat1g1$time, ytreat, type="n", xlab="time", ylab="stress", ylim=c(40,55))
lines(dat1g1$time, ytreat, col="red")
lines(dat1g1$time, yctrl, col="green")
legend(1, 43, c("Treatment", "Control"), lty=1, col=c("red", "green"))
treat
significantly predicted the random slopes of
timetreat
and timefollow
. Thus, both
cross-level interactions are probed by the interactions
package. Before using the package, all variables must be centered by
creating new variables.
dat1$avesleepc <- dat1$avesleep - 7.075
dat1$treatc <- dat1$treat - 0.5
dat1$agec <- dat1$age - 36.638
out1in <- lmer(stress ~ 1 + timetreat + timefollow + diffsleep + timetreat*avesleepc + timetreat*treatc + timetreat*agec + timefollow*avesleepc + timefollow*treatc + timefollow*agec + (1 + timetreat+ timefollow|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead", calc.derivs = FALSE))
summary(out1in)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: stress ~ 1 + timetreat + timefollow + diffsleep + timetreat *
## avesleepc + timetreat * treatc + timetreat * agec + timefollow *
## avesleepc + timefollow * treatc + timefollow * agec + (1 +
## timetreat + timefollow | pid)
## Data: dat1
## Control: lmerControl(optimizer = "Nelder_Mead", calc.derivs = FALSE)
##
## AIC BIC logLik deviance df.resid
## 101023.0 101173.9 -50491.5 100983.0 13980
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.0674 -0.6351 0.0053 0.6621 3.3091
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 0.3553 0.5960
## timetreat 0.4344 0.6591 0.46
## timefollow 1.3450 1.1598 -0.79 0.14
## Residual 73.5184 8.5743
## Number of obs: 14000, groups: pid, 1000
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 52.418891 0.110471 474.503
## timetreat -0.885219 0.041316 -21.426
## timefollow 0.593959 0.106793 5.562
## diffsleep -1.892950 0.139462 -13.573
## avesleepc 0.251976 0.156970 1.605
## treatc 0.059906 0.220894 0.271
## agec 0.013452 0.020844 0.645
## timetreat:avesleepc -0.035437 0.058707 -0.604
## timetreat:treatc -1.864007 0.082616 -22.562
## timetreat:agec 0.002091 0.007796 0.268
## timefollow:avesleepc -0.033808 0.151753 -0.223
## timefollow:treatc 1.188626 0.213551 5.566
## timefollow:agec -0.018400 0.020154 -0.913
##
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
First, the simple slopes of timetreat
at each treatment
condition is calculated.
treatcval <- c(0, 1) - 0.5
library(interactions)
ss11 <- sim_slopes(model=out1in, pred=timetreat, modx=treatc, modx.values=treatcval)
ss11
## JOHNSON-NEYMAN INTERVAL
##
## When treatc is OUTSIDE the interval [-0.54, -0.42], the slope of timetreat
## is p < .05.
##
## Note: The range of observed values of treatc is [-0.50, 0.50]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of timetreat when treatc = -0.50:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.05 0.06 0.81 0.42
##
## Slope of timetreat when treatc = 0.50:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -1.82 0.06 -31.06 0.00
Plot the interaction where X-axis is timetreat
and each
line represents each value of treat
.
interact_plot(model=out1in, pred=timetreat, modx=treatc, modx.values=treatcval)
Second, the simple slopes of timefollow
at each
treatment condition is calculated.
ss12 <- sim_slopes(model=out1in, pred=timefollow, modx=treatc, modx.values=treatcval)
ss12
## JOHNSON-NEYMAN INTERVAL
##
## When treatc is OUTSIDE the interval [-0.85, -0.30], the slope of timefollow
## is p < .05.
##
## Note: The range of observed values of treatc is [-0.50, 0.50]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of timefollow when treatc = -0.50:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.00 0.15 0.00 1.00
##
## Slope of timefollow when treatc = 0.50:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 1.19 0.15 7.85 0.00
Plot the interaction where X-axis is timefollow
and each
line represents each value of treat
.
interact_plot(model=out1in, pred=timefollow, modx=treatc, modx.values=treatcval)
Third, the expected differences between treatment conditions at each
time is calculated. However, the time variables are separated into
timetreat
and timefollow
. Thus,
timetreat
is treated as the first moderator and
timefollow
is treated as the second moderator. To test the
expected differences at all time points, all possible data points of
timetreat
and timefollow
are included. Note
that the time used for this probing interaction is more than one
hour.
timetreatval <- c(0, 1, 2, 3, 4, 5, 6)
timefollowval <- c(0, 1, 2, 3)
ss13 <- sim_slopes(model=out1in, pred=treatc, modx=timetreat, mod2=timefollow, modx.values=timetreatval, mod2.values=timefollowval)
ss13
## ████████████████████ While timefollow (2nd moderator) = 3.00 ███████████████████
##
## JOHNSON-NEYMAN INTERVAL
##
## When timetreat is OUTSIDE the interval [1.27, 2.57], the slope of treatc is
## p < .05.
##
## Note: The range of observed values of timetreat is [0.00, 6.00]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of treatc when timetreat = 6.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -7.56 0.57 -13.32 0.00
##
## Slope of treatc when timetreat = 5.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -5.69 0.56 -10.14 0.00
##
## Slope of treatc when timetreat = 4.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -3.83 0.57 -6.75 0.00
##
## Slope of treatc when timetreat = 3.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -1.97 0.59 -3.35 0.00
##
## Slope of treatc when timetreat = 2.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.10 0.61 -0.17 0.87
##
## Slope of treatc when timetreat = 1.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 1.76 0.65 2.70 0.01
##
## Slope of treatc when timetreat = 0.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 3.63 0.70 5.19 0.00
##
## ████████████████████ While timefollow (2nd moderator) = 2.00 ███████████████████
##
## JOHNSON-NEYMAN INTERVAL
##
## When timetreat is OUTSIDE the interval [0.82, 1.75], the slope of treatc is
## p < .05.
##
## Note: The range of observed values of timetreat is [0.00, 6.00]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of treatc when timetreat = 6.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -8.75 0.42 -20.60 0.00
##
## Slope of treatc when timetreat = 5.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -6.88 0.40 -17.24 0.00
##
## Slope of treatc when timetreat = 4.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -5.02 0.38 -13.16 0.00
##
## Slope of treatc when timetreat = 3.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -3.15 0.39 -8.02 0.00
##
## Slope of treatc when timetreat = 2.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -1.29 0.42 -3.11 0.00
##
## Slope of treatc when timetreat = 1.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.57 0.45 1.27 0.21
##
## Slope of treatc when timetreat = 0.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 2.44 0.49 4.98 0.00
##
## ████████████████████ While timefollow (2nd moderator) = 1.00 ███████████████████
##
## JOHNSON-NEYMAN INTERVAL
##
## When timetreat is OUTSIDE the interval [0.35, 0.96], the slope of treatc is
## p < .05.
##
## Note: The range of observed values of timetreat is [0.00, 6.00]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of treatc when timetreat = 6.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -9.94 0.37 -27.11 0.00
##
## Slope of treatc when timetreat = 5.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -8.07 0.31 -26.19 0.00
##
## Slope of treatc when timetreat = 4.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -6.21 0.26 -23.56 0.00
##
## Slope of treatc when timetreat = 3.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -4.34 0.24 -18.14 0.00
##
## Slope of treatc when timetreat = 2.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -2.48 0.24 -10.19 0.00
##
## Slope of treatc when timetreat = 1.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.62 0.27 -2.27 0.02
##
## Slope of treatc when timetreat = 0.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 1.25 0.32 3.87 0.00
##
## ████████████████████ While timefollow (2nd moderator) = 0.00 ███████████████████
##
## JOHNSON-NEYMAN INTERVAL
##
## When timetreat is OUTSIDE the interval [-0.21, 0.25], the slope of treatc
## is p < .05.
##
## Note: The range of observed values of timetreat is [0.00, 6.00]
##
## SIMPLE SLOPES ANALYSIS
##
## Slope of treatc when timetreat = 6.00:
##
## Est. S.E. t val. p
## -------- ------ -------- ------
## -11.12 0.42 -26.37 0.00
##
## Slope of treatc when timetreat = 5.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -9.26 0.35 -26.49 0.00
##
## Slope of treatc when timetreat = 4.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -7.40 0.28 -26.13 0.00
##
## Slope of treatc when timetreat = 3.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -5.53 0.23 -24.31 0.00
##
## Slope of treatc when timetreat = 2.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -3.67 0.16 -22.45 0.00
##
## Slope of treatc when timetreat = 1.00:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -1.80 0.19 -9.49 0.00
##
## Slope of treatc when timetreat = 0.00:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.06 0.22 0.27 0.78
Read the data set from Lecture 11.
dat2 <- read.table("lecture11ex2.csv", sep=",", header=TRUE)
Run the model where time is centered at the first year and stress is separated into Level-1 and Level-2 covariates by group mean centering and predicts the sense of belonging.
dat2$timec <- dat2$time - 1
dat2$avestress <- ave(dat2$stress, dat2$pid)
dat2$diffstress <- dat2$stress - dat2$avestress
library(lme4)
out2 <- lmer(mem ~ 1 + timec + diffstress + avestress + (1 + timec|pid), data=dat2, REML=FALSE)
summary(out2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: mem ~ 1 + timec + diffstress + avestress + (1 + timec | pid)
## Data: dat2
##
## AIC BIC logLik deviance df.resid
## 25552.9 25602.0 -12768.5 25536.9 3392
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.10411 -0.48077 0.00681 0.49606 2.61723
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pid (Intercept) 234.68 15.319
## timec 35.29 5.941 -0.15
## Residual 25.92 5.091
## Number of obs: 3400, groups: pid, 850
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 56.78964 2.57724 22.035
## timec -1.88457 0.21823 -8.636
## diffstress 0.40878 0.01212 33.730
## avestress 0.10050 0.05011 2.006
##
## Correlation of Fixed Effects:
## (Intr) timec dffstr
## timec -0.045
## diffstress -0.004 0.011
## avestress -0.977 0.000 0.003
The lme4
package does not have features for
heteroscedastic models. Rather the nlme
package is used
instead. Let’s run the previous model with nlme
.
library(nlme)
out2a <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat2, method="ML", na.action=na.omit)
summary(out2a)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat2
## AIC BIC logLik
## 25552.95 25602 -12768.47
##
## Random effects:
## Formula: ~1 + timec | pid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 15.319143 (Intr)
## timec 5.940565 -0.152
## Residual 5.090837
##
## Fixed effects: mem ~ 1 + timec + diffstress + avestress
## Value Std.Error DF t-value p-value
## (Intercept) 56.78964 2.5787499 2548 22.02216 0.0000
## timec -1.88457 0.2183537 2548 -8.63082 0.0000
## diffstress 0.40878 0.0121265 2548 33.70962 0.0000
## avestress 0.10050 0.0501375 848 2.00458 0.0453
## Correlation:
## (Intr) timec dffstr
## timec -0.045
## diffstress -0.004 0.011
## avestress -0.977 0.000 0.003
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.104097747 -0.480769826 0.006814801 0.496061463 2.617219474
##
## Number of Observations: 3400
## Number of Groups: 850
Next, run the models where the residual variances are different
across four time points. The weights
argument sets the
variance structure. In this case, the variance is identical at each
value of timec
.
out2b <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat2, method="ML", na.action=na.omit, weights=varIdent(form = ~1|timec))
summary(out2b)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat2
## AIC BIC logLik
## 25510.7 25578.15 -12744.35
##
## Random effects:
## Formula: ~1 + timec | pid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 15.332924 (Intr)
## timec 5.888768 -0.154
## Residual 6.527777
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | timec
## Parameter estimates:
## 0 1 2 3
## 1.0000000 0.6696634 0.9035280 0.3313754
## Fixed effects: mem ~ 1 + timec + diffstress + avestress
## Value Std.Error DF t-value p-value
## (Intercept) 56.91609 2.5920715 2548 21.95776 0.0000
## timec -1.78621 0.2110654 2548 -8.46281 0.0000
## diffstress 0.40682 0.0119806 2548 33.95677 0.0000
## avestress 0.09579 0.0503951 848 1.90079 0.0577
## Correlation:
## (Intr) timec dffstr
## timec -0.046
## diffstress -0.005 0.014
## avestress -0.977 0.000 0.004
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.559353311 -0.410386627 0.009440652 0.406160670 2.627260285
##
## Number of Observations: 3400
## Number of Groups: 850
Compare the models with equal and unequal variances across time. The unequal residual variances were significant.
anova(out2a, out2b)
## Model df AIC BIC logLik Test L.Ratio p-value
## out2a 1 8 25552.95 25602.00 -12768.47
## out2b 2 11 25510.70 25578.15 -12744.35 1 vs 2 48.2462 <.0001
Let’s calculate residual standard deviations at each time point.
out2bsigma <- as.numeric(VarCorr(out2b)[3, 2])
out2bsigmatime <- out2bsigma * coef(out2b$modelStruct$varStruct, uncons = FALSE)
c(out2bsigma, out2bsigmatime)
## 1 2 3
## 6.527777 4.371413 5.898029 2.163145
Next, run the models where the residual variances have the
exponential changes. The weights
argument is specified that
the time
variable predicts the residual variance by the
exponential function.
out2c <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat2, method="ML", na.action=na.omit, weights=varExp(form = ~time))
summary(out2c)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat2
## AIC BIC logLik
## 25552.81 25607.99 -12767.4
##
## Random effects:
## Formula: ~1 + timec | pid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 15.271240 (Intr)
## timec 5.899544 -0.144
## Residual 5.560063
##
## Variance function:
## Structure: Exponential of variance covariate
## Formula: ~time
## Parameter estimates:
## expon
## -0.03465414
## Fixed effects: mem ~ 1 + timec + diffstress + avestress
## Value Std.Error DF t-value p-value
## (Intercept) 56.81090 2.5799359 2548 22.02028 0.000
## timec -1.87412 0.2170987 2548 -8.63256 0.000
## diffstress 0.40759 0.0121412 2548 33.57090 0.000
## avestress 0.09976 0.0501617 848 1.98877 0.047
## Correlation:
## (Intr) timec dffstr
## timec -0.045
## diffstress -0.004 0.012
## avestress -0.977 0.000 0.003
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.20408953 -0.48006748 0.01228221 0.49351786 2.65221045
##
## Number of Observations: 3400
## Number of Groups: 850
The exponential change in residual variances is not significantly better than the equal variances model.
anova(out2a, out2c)
## Model df AIC BIC logLik Test L.Ratio p-value
## out2a 1 8 25552.95 25602.00 -12768.47
## out2c 2 9 25552.81 25607.99 -12767.40 1 vs 2 2.14082 0.1434
Let’s calculate residual standard deviations at each time point.
out2csigma <- as.numeric(VarCorr(out2c)[3, 2])
expvalue <- coef(out2c$modelStruct$varStruct, uncons = FALSE)
out2csigma * exp(expvalue * (1:4))
## [1] 5.370684 5.187756 5.011058 4.840378
Next, run the models where the residual variances is the power
function of time
.
out2d <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat2, method="ML", na.action=na.omit, weights=varPower(form = ~time))
summary(out2d)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat2
## AIC BIC logLik
## 25551.99 25607.17 -12766.99
##
## Random effects:
## Formula: ~1 + timec | pid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 15.252205 (Intr)
## timec 5.883418 -0.141
## Residual 5.495255
##
## Variance function:
## Structure: Power of variance covariate
## Formula: ~time
## Parameter estimates:
## power
## -0.08991349
## Fixed effects: mem ~ 1 + timec + diffstress + avestress
## Value Std.Error DF t-value p-value
## (Intercept) 56.82431 2.5804668 2548 22.02094 0.000
## timec -1.87242 0.2168923 2548 -8.63293 0.000
## diffstress 0.40734 0.0121381 2548 33.55850 0.000
## avestress 0.09938 0.0501726 848 1.98067 0.048
## Correlation:
## (Intr) timec dffstr
## timec -0.045
## diffstress -0.004 0.012
## avestress -0.977 0.000 0.003
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.205885624 -0.476766976 0.009982646 0.491381290 2.649537461
##
## Number of Observations: 3400
## Number of Groups: 850
The power change in residual variances is not significantly better than the equal variances model.
anova(out2a, out2d)
## Model df AIC BIC logLik Test L.Ratio p-value
## out2a 1 8 25552.95 25602.00 -12768.47
## out2d 2 9 25551.99 25607.17 -12767.00 1 vs 2 2.958496 0.0854
Let’s calculate residual standard deviations at each time point.
out2dsigma <- as.numeric(VarCorr(out2d)[3, 2])
powervalue <- coef(out2d$modelStruct$varStruct, uncons = FALSE)
out2dsigma * (1:4)^powervalue
## [1] 5.495255 5.163227 4.978382 4.851260
Next, run the models where the residual variances is the power
function of time
with added constant.
out2e <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat2, method="ML", na.action=na.omit, weights=varConstPower(form = ~time), control = lmeControl(maxIter=200, msMaxIter = 200))
summary(out2e)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat2
## AIC BIC logLik
## 25547.19 25608.51 -12763.6
##
## Random effects:
## Formula: ~1 + timec | pid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 15.192096 (Intr)
## timec 5.813162 -0.126
## Residual 1.157254
##
## Variance function:
## Structure: Constant plus power of variance covariate
## Formula: ~time
## Parameter estimates:
## const power
## 4.246851 -23.276544
## Fixed effects: mem ~ 1 + timec + diffstress + avestress
## Value Std.Error DF t-value p-value
## (Intercept) 56.88875 2.5840961 2548 22.01495 0.0000
## timec -1.86365 0.2160190 2548 -8.62726 0.0000
## diffstress 0.40658 0.0121021 2548 33.59568 0.0000
## avestress 0.09755 0.0502450 848 1.94158 0.0525
## Correlation:
## (Intr) timec dffstr
## timec -0.044
## diffstress -0.005 0.012
## avestress -0.977 0.000 0.003
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.18681037 -0.48089878 0.01053416 0.49459340 2.62536918
##
## Number of Observations: 3400
## Number of Groups: 850
The constant power change in residual variances is significantly better than the equal variances model.
anova(out2a, out2e)
## Model df AIC BIC logLik Test L.Ratio p-value
## out2a 1 8 25552.95 25602.00 -12768.47
## out2e 2 10 25547.19 25608.51 -12763.60 1 vs 2 9.753754 0.0076
Let’s calculate residual standard deviations at each time point.
out2esigma <- as.numeric(VarCorr(out2e)[3, 2])
paramvar <- coef(out2e$modelStruct$varStruct, uncons = FALSE)
out2esigma * (paramvar[1] + (1:4)^paramvar[2])
## [1] 6.071939 4.914686 4.914685 4.914685
The model with different variances across times and the model where the variances are predicted by the constant power function are compared. The latter model is the restricted model. The test was significant so the model with different variances across times is selected.
anova(out2e, out2b)
## Model df AIC BIC logLik Test L.Ratio p-value
## out2e 1 10 25547.19 25608.51 -12763.60
## out2b 2 11 25510.70 25578.15 -12744.35 1 vs 2 38.49245 <.0001
Read the data set from Lecture 4.
dat3 <- read.table("lecture4ex2.csv", sep=",", header=TRUE)
Run the model predicting interviewee’s ratings by the interaction between interviewer’s and interviewee’s sex. In this model, the residual variances are equal across all combinations of interviewers and interviewees.
out3a <- lme(score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100), random = ~1 + eesex|erid, data=dat3, method="ML", na.action=na.omit)
summary(out3a)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat3
## AIC BIC logLik
## 61640.99 61705.88 -30811.5
##
## Random effects:
## Formula: ~1 + eesex | erid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 4.4849848 (Intr)
## eesex 0.8932304 0.125
## Residual 4.6652507
##
## Fixed effects: score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100)
## Value Std.Error DF t-value p-value
## (Intercept) 75.53870 0.22127122 8997 341.3851 0.0000
## ersex 0.00989 0.31292405 998 0.0316 0.9748
## eesex -0.41323 0.13790177 8997 -2.9966 0.0027
## I(iq - 100) 0.02942 0.00329961 8997 8.9155 0.0000
## ersex:eesex 1.07365 0.19502911 8997 5.5051 0.0000
## Correlation:
## (Intr) ersex eesex I(-100
## ersex -0.707
## eesex -0.253 0.179
## I(iq - 100) -0.003 0.002 0.001
## ersex:eesex 0.179 -0.253 -0.707 -0.008
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.741278287 -0.636668210 -0.004775941 0.640990961 3.432036414
##
## Number of Observations: 10000
## Number of Groups: 1000
Next, estimate different residual variances across interviewee’s sex.
out3b <- lme(score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100), random = ~1 + eesex|erid, data=dat3, method="ML", na.action=na.omit, weights=varIdent(form = ~1|eesex))
summary(out3b)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat3
## AIC BIC logLik
## 61642.39 61714.49 -30811.19
##
## Random effects:
## Formula: ~1 + eesex | erid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 4.4790343 (Intr)
## eesex 0.8932344 0.138
## Residual 4.6937412
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | eesex
## Parameter estimates:
## 0 1
## 1.0000000 0.9878226
## Fixed effects: score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100)
## Value Std.Error DF t-value p-value
## (Intercept) 75.53870 0.22127116 8997 341.3852 0.0000
## ersex 0.00989 0.31292398 998 0.0316 0.9748
## eesex -0.41323 0.13790180 8997 -2.9966 0.0027
## I(iq - 100) 0.02941 0.00329918 8997 8.9143 0.0000
## ersex:eesex 1.07365 0.19502915 8997 5.5051 0.0000
## Correlation:
## (Intr) ersex eesex I(-100
## ersex -0.707
## eesex -0.253 0.179
## I(iq - 100) -0.003 0.002 0.001
## ersex:eesex 0.179 -0.253 -0.707 -0.008
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.728448706 -0.635616701 -0.004869759 0.641990770 3.409480268
##
## Number of Observations: 10000
## Number of Groups: 1000
The likelihood ratio test indicated that residual variances are not significantly different across interviewee’s sex.
anova(out3a, out3b)
## Model df AIC BIC logLik Test L.Ratio p-value
## out3a 1 9 61640.99 61705.88 -30811.49
## out3b 2 10 61642.39 61714.49 -30811.19 1 vs 2 0.6004416 0.4384
Next, estimate different residual variances across interviewer’s sex.
out3c <- lme(score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100), random = ~1 + eesex|erid, data=dat3, method="ML", na.action=na.omit, weights=varIdent(form = ~1|ersex))
summary(out3c)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat3
## AIC BIC logLik
## 61642.85 61714.95 -30811.43
##
## Random effects:
## Formula: ~1 + eesex | erid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 4.4850481 (Intr)
## eesex 0.8929552 0.125
## Residual 4.6783279
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | ersex
## Parameter estimates:
## 0 1
## 1.0000000 0.9944001
## Fixed effects: score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100)
## Value Std.Error DF t-value p-value
## (Intercept) 75.53871 0.22138425 8997 341.2108 0.0000
## ersex 0.00988 0.31292764 998 0.0316 0.9748
## eesex -0.41324 0.13825236 8997 -2.9890 0.0028
## I(iq - 100) 0.02939 0.00329958 8997 8.9061 0.0000
## ersex:eesex 1.07366 0.19502394 8997 5.5053 0.0000
## Correlation:
## (Intr) ersex eesex I(-100
## ersex -0.707
## eesex -0.253 0.179
## I(iq - 100) -0.003 0.002 0.001
## ersex:eesex 0.180 -0.253 -0.709 -0.008
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.731018675 -0.635825449 -0.004824803 0.640120756 3.442342623
##
## Number of Observations: 10000
## Number of Groups: 1000
The likelihood ratio test indicated that residual variances are not significantly different across interviewer’s sex.
anova(out3a, out3c)
## Model df AIC BIC logLik Test L.Ratio p-value
## out3a 1 9 61640.99 61705.88 -30811.49
## out3c 2 10 61642.85 61714.95 -30811.42 1 vs 2 0.1395039 0.7088
Finally, estimate different residual variances across both interviewee’s and interviewer’s sex.
out3d <- lme(score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100), random = ~1 + eesex|erid, data=dat3, method="ML", na.action=na.omit, weights=varIdent(form = ~1|eesex * ersex))
summary(out3d)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat3
## AIC BIC logLik
## 61646.21 61732.73 -30811.11
##
## Random effects:
## Formula: ~1 + eesex | erid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 4.4791314 (Intr)
## eesex 0.8928173 0.139
## Residual 4.7142122
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | eesex * ersex
## Parameter estimates:
## 0*0 1*0 0*1 1*1
## 1.0000000 0.9847232 0.9913074 0.9823369
## Fixed effects: score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100)
## Value Std.Error DF t-value p-value
## (Intercept) 75.53871 0.22144919 8997 341.1108 0.0000
## ersex 0.00988 0.31292985 998 0.0316 0.9748
## eesex -0.41324 0.13825094 8997 -2.9890 0.0028
## I(iq - 100) 0.02938 0.00329901 8997 8.9058 0.0000
## ersex:eesex 1.07367 0.19502180 8997 5.5054 0.0000
## Correlation:
## (Intr) ersex eesex I(-100
## ersex -0.708
## eesex -0.254 0.180
## I(iq - 100) -0.003 0.002 0.001
## ersex:eesex 0.180 -0.253 -0.709 -0.008
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.714797353 -0.636329830 -0.004603456 0.641754174 3.425222190
##
## Number of Observations: 10000
## Number of Groups: 1000
The likelihood ratio test indicated that residual variances are not significantly different across interviewer’s sex.
anova(out3a, out3d)
## Model df AIC BIC logLik Test L.Ratio p-value
## out3a 1 9 61640.99 61705.88 -30811.49
## out3d 2 12 61646.21 61732.73 -30811.10 1 vs 2 0.7796672 0.8543
Read the data set from Lecture 11.
dat4 <- read.table("lecture11ex2.csv", sep=",", header=TRUE)
Again, the nlme
package is needed. Let’s make the model
predicting sense of belonging where (a) time is centered at Year 1, (b)
stress is separated into individual means and deviations from individual
means, and (3) time has random slopes.
dat4$timec <- dat4$time - 1
dat4$avestress <- ave(dat4$stress, dat4$pid)
dat4$diffstress <- dat4$stress - dat4$avestress
library(nlme)
out4a <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat4, method="ML", na.action=na.omit)
summary(out4a)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat4
## AIC BIC logLik
## 25552.95 25602 -12768.47
##
## Random effects:
## Formula: ~1 + timec | pid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 15.319143 (Intr)
## timec 5.940565 -0.152
## Residual 5.090837
##
## Fixed effects: mem ~ 1 + timec + diffstress + avestress
## Value Std.Error DF t-value p-value
## (Intercept) 56.78964 2.5787499 2548 22.02216 0.0000
## timec -1.88457 0.2183537 2548 -8.63082 0.0000
## diffstress 0.40878 0.0121265 2548 33.70962 0.0000
## avestress 0.10050 0.0501375 848 2.00458 0.0453
## Correlation:
## (Intr) timec dffstr
## timec -0.045
## diffstress -0.004 0.011
## avestress -0.977 0.000 0.003
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.104097747 -0.480769826 0.006814801 0.496061463 2.617219474
##
## Number of Observations: 3400
## Number of Groups: 850
The first-order autoregressive model (AR1) is used to predict the
error relationship. The correlation
argument is used. The
corARMA
function is used to indicate that the
autoregressive model is used. p = 1
means to use the
first-order model.
out4e <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat4, method="ML", na.action=na.omit, correlation=corARMA(form = ~ 1 + timec | pid, p = 1))
summary(out4e)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat4
## AIC BIC logLik
## 25546.58 25601.76 -12764.29
##
## Random effects:
## Formula: ~1 + timec | pid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 15.040700 (Intr)
## timec 5.761745 -0.133
## Residual 5.762395
##
## Correlation Structure: AR(1)
## Formula: ~1 + timec | pid
## Parameter estimate(s):
## Phi
## 0.2107787
## Fixed effects: mem ~ 1 + timec + diffstress + avestress
## Value Std.Error DF t-value p-value
## (Intercept) 56.73589 2.5786531 2548 22.00214 0.0000
## timec -1.87752 0.2177654 2548 -8.62177 0.0000
## diffstress 0.40889 0.0120720 2548 33.87091 0.0000
## avestress 0.10181 0.0501331 848 2.03083 0.0426
## Correlation:
## (Intr) timec dffstr
## timec -0.046
## diffstress -0.004 0.011
## avestress -0.977 0.000 0.003
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.959984993 -0.435218926 0.005818563 0.435684614 2.496634314
##
## Number of Observations: 3400
## Number of Groups: 850
The AR1 is significant.
anova(out4a, out4e)
## Model df AIC BIC logLik Test L.Ratio p-value
## out4a 1 8 25552.95 25602.00 -12768.47
## out4e 2 9 25546.58 25601.76 -12764.29 1 vs 2 8.370098 0.0038
Plot the error structure.
set.seed(123321)
ar1 <- coef(out4e$modelStruct$corStruct, uncons = FALSE)
out4esigma <- as.numeric(VarCorr(out4e)["Residual", "StdDev"])
e1 <- rnorm(20, 0, out4esigma/sqrt(1 - ar1^2))
e2 <- ar1*e1 + rnorm(20, 0, out4esigma)
e3 <- ar1*e2 + rnorm(20, 0, out4esigma)
e4 <- ar1*e3 + rnorm(20, 0, out4esigma)
mye <- c(e1, e2, e3, e4)
myid <- rep(1:20, 4)
myt <- rep(1:4, each=20)
daterror5 <- data.frame(t=myt, y=mye, id=myid)
library(ggplot2)
g <- ggplot(data=daterror5, aes(x=t, y=y, group=id))
g + geom_line(aes(color=factor(id)), size=1)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The second-order autoregressive model (AR2) is used to predict the
error relationship and compared with the AR1 model. p = 2
means to use the second-order model in the corARMA
function.
out4f <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat4, method="ML", na.action=na.omit, correlation=corARMA(form = ~ 1 + timec | pid, p = 2))
summary(out4f)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat4
## AIC BIC logLik
## 25547.67 25608.98 -12763.83
##
## Random effects:
## Formula: ~1 + timec | pid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 14.017434 (Intr)
## timec 5.522055 -0.107
## Residual 7.838567
##
## Correlation Structure: ARMA(2,0)
## Formula: ~1 + timec | pid
## Parameter estimate(s):
## Phi1 Phi2
## 0.4761749 0.1355516
## Fixed effects: mem ~ 1 + timec + diffstress + avestress
## Value Std.Error DF t-value p-value
## (Intercept) 56.64618 2.5773895 2548 21.97812 0.0000
## timec -1.87812 0.2178019 2548 -8.62307 0.0000
## diffstress 0.40870 0.0120715 2548 33.85683 0.0000
## avestress 0.10435 0.0501044 848 2.08275 0.0376
## Correlation:
## (Intr) timec dffstr
## timec -0.048
## diffstress -0.004 0.011
## avestress -0.977 0.000 0.003
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.495698943 -0.393252700 -0.007488328 0.368454214 1.939632824
##
## Number of Observations: 3400
## Number of Groups: 850
The AR2 model was not significantly better than the AR1 model.
anova(out4e, out4f)
## Model df AIC BIC logLik Test L.Ratio p-value
## out4e 1 9 25546.58 25601.76 -12764.29
## out4f 2 10 25547.67 25608.99 -12763.83 1 vs 2 0.9078822 0.3407
Plot the error structure.
set.seed(123321)
ar <- coef(out4f$modelStruct$corStruct, uncons = FALSE)
out4fsigma <- as.numeric(VarCorr(out4f)["Residual", "StdDev"])
out4fsigma <- out4fsigma/sqrt(1 - ar[1]^2 - ar[2]^2 - ((ar[1]^2)*ar[2]/(1 - ar[2])))
e1 <- rnorm(20, 0, out4fsigma)
e2 <- ar[1]*e1 + rnorm(20, 0, out4fsigma)
e3 <- ar[1]*e2 + ar[2]*e1 + rnorm(20, 0, out4fsigma)
e4 <- ar[1]*e3 + ar[2]*e2 + rnorm(20, 0, out4fsigma)
mye <- c(e1, e2, e3, e4)
myid <- rep(1:20, 4)
myt <- rep(1:4, each=20)
daterror6 <- data.frame(t=myt, y=mye, id=myid)
library(ggplot2)
g <- ggplot(data=daterror6, aes(x=t, y=y, group=id))
g + geom_line(aes(color=factor(id)), size=1)
The AR1 model with different residual variances across time is used.
The weights
argument is added in the AR1 model.
out4g <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat4, method="ML", na.action=na.omit, weights=varIdent(form = ~1|timec), correlation=corARMA(form = ~ 1 + timec | pid, p = 1))
summary(out4g)
## Linear mixed-effects model fit by maximum likelihood
## Data: dat4
## AIC BIC logLik
## 25493.77 25567.35 -12734.89
##
## Random effects:
## Formula: ~1 + timec | pid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 14.711750 (Intr)
## timec 5.794567 -0.108
## Residual 7.054760
##
## Correlation Structure: AR(1)
## Formula: ~1 + timec | pid
## Parameter estimate(s):
## Phi
## 0.2803926
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | timec
## Parameter estimates:
## 0 1 2 3
## 1.0000000 0.7982827 0.9267105 0.1845166
## Fixed effects: mem ~ 1 + timec + diffstress + avestress
## Value Std.Error DF t-value p-value
## (Intercept) 56.74832 2.5775886 2548 22.01605 0.0000
## timec -1.79050 0.2109160 2548 -8.48917 0.0000
## diffstress 0.40719 0.0119572 2548 34.05388 0.0000
## avestress 0.10043 0.0501128 848 2.00404 0.0454
## Correlation:
## (Intr) timec dffstr
## timec -0.046
## diffstress -0.005 0.013
## avestress -0.977 0.000 0.003
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.6979843 -0.3670609 -0.0138252 0.3270481 2.4832686
##
## Number of Observations: 3400
## Number of Groups: 850
The heteroscedastic AR1 model is compared with the homoscedastic AR1 model.
anova(out4e, out4g)
## Model df AIC BIC logLik Test L.Ratio p-value
## out4e 1 9 25546.58 25601.76 -12764.29
## out4g 2 12 25493.77 25567.35 -12734.89 1 vs 2 58.80574 <.0001
Plot the error structure.
set.seed(123321)
ar1 <- coef(out4g$modelStruct$corStruct, uncons = FALSE)
sdratio <- coef(out4g$modelStruct$varStruct, uncons = FALSE)
out4gsigma <- as.numeric(VarCorr(out4g)["Residual", "StdDev"])
e1 <- rnorm(20, 0, out4gsigma/sqrt(1 - ar1^2))
e2 <- ar1*e1 + rnorm(20, 0, out4gsigma * sdratio[1])
e3 <- ar1*e2 + rnorm(20, 0, out4gsigma * sdratio[2])
e4 <- ar1*e3 + rnorm(20, 0, out4gsigma * sdratio[3])
mye <- c(e1, e2, e3, e4)
myid <- rep(1:20, 4)
myt <- rep(1:4, each=20)
daterror5 <- data.frame(t=myt, y=mye, id=myid)
library(ggplot2)
g <- ggplot(data=daterror5, aes(x=t, y=y, group=id))
g + geom_line(aes(color=factor(id)), size=1)
Read the data set from Lecture 11.
dat5 <- read.table("lecture11ex2.csv", sep=",", header=TRUE)
Run the model with equal residual variances and without autocorrelation.
dat5$timec <- dat5$time - 1
dat5$avestress <- ave(dat5$stress, dat5$pid)
dat5$diffstress <- dat5$stress - dat5$avestress
library(nlme)
out5a <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat5, method="ML", na.action=na.omit)
sumout5a <- summary(out5a)
See that the output from the summary
function is saved
as an object. The output will be used in the
r2mlm_long_manual
function as mentioned in Lecture 8. \(R^2\) can be calculated as follows:
library(r2mlm)
##
## Please cite as:
## Shaw, M., Rights, J.D., Sterba, S.S., Flake, J.K. (2023). r2mlm: An R package calculating R-squared measures for multilevel models. Behavior Research Methods, 55(4), 1942-1964. doi:10.3758/s13428-022-01841-4
r2mlm_long_manual(data=dat5, covs=c("timec", "diffstress", "avestress"), random_covs=c("timec"), gammas=coef(sumout5a)[-1, "Value"], clusterID="pid", Tau=getVarCov(out5a), sigma2=out5a$sigma^2, bargraph=FALSE)
## $Decompositions
## total within between
## fixed slopes (within) 0.047108859 0.1952116 NA
## fixed slopes (between) 0.003166966 NA 0.004174322
## slope variation (within) 0.122351796 0.5070063 NA
## slope variation (between) 0.000000000 NA 0.000000000
## intercept variation (between) 0.755511013 NA 0.995825678
## residual (within) 0.071861365 0.2977820 NA
##
## $R2s
## total within between
## f1 0.047108859 0.1952116 NA
## f2 0.003166966 NA 0.004174322
## v1 0.122351796 0.5070063 NA
## v2 0.000000000 NA 0.000000000
## m 0.755511013 NA 0.995825678
## f 0.050275825 NA NA
## fv 0.172627621 0.7022180 0.004174322
## fvm 0.928138635 NA NA
Next, the model with equal residual variances and AR1 is analyzed.
out5e <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat5, method="ML", na.action=na.omit, correlation=corARMA(form = ~ 1 + timec | pid, p = 1))
sumout5e <- summary(out5e)
In the AR1 model, the residual variance in the summary
function cannot be used directly in the sigma2 argument. However,
residual variance can be calculated by \(\sigma^2/(1 - \rho^2)\)
r2mlm_long_manual(data=dat5, covs=c("timec", "diffstress", "avestress"), random_covs=c("timec"), gammas=coef(sumout5e)[-1, "Value"], clusterID="pid", Tau=getVarCov(out5e), sigma2=out5e$sigma^2/(1 - ar1^2), bargraph=FALSE)
## $Decompositions
## total within between
## fixed slopes (within) 0.04684901 0.1794703 NA
## fixed slopes (between) 0.00323730 NA 0.00438089
## slope variation (within) 0.11465107 0.4392079 NA
## slope variation (between) 0.00000000 NA 0.00000000
## intercept variation (between) 0.73572216 NA 0.99561911
## residual (within) 0.09954046 0.3813218 NA
##
## $R2s
## total within between
## f1 0.04684901 0.1794703 NA
## f2 0.00323730 NA 0.00438089
## v1 0.11465107 0.4392079 NA
## v2 0.00000000 NA 0.00000000
## m 0.73572216 NA 0.99561911
## f 0.05008631 NA NA
## fv 0.16473738 0.6186782 0.00438089
## fvm 0.90045954 NA NA
Next, the model with unequal residual variances and without autocorrelation is analyzed.
out5b <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat5, method="ML", na.action=na.omit, weights=varIdent(form = ~1|timec))
sumout5b <- summary(out5b)
In the heteroscedastic model, the residual variances are unequal across time. Thus, residual variances are averaged across all data points. Then, put the average in the formula.
out5bsigma <- out5b$sigma
out5bsigmatime <- out5bsigma * coef(out5b$modelStruct$varStruct, uncons = FALSE)
out5berrvar <- c(out5bsigma, out5bsigmatime)^2
out5berrvar <- cbind(1:4, out5berrvar)
matchindex5 <- match(dat5$time, out5berrvar[,1])
dat5$vare <- out5berrvar[matchindex5, 2]
aveerrvar5b <- mean(dat5$vare)
r2mlm_long_manual(data=dat5, covs=c("timec", "diffstress", "avestress"), random_covs=c("timec"), gammas=coef(sumout5b)[-1, "Value"], clusterID="pid", Tau=getVarCov(out5b), sigma2=aveerrvar5b, bargraph=FALSE)
## $Decompositions
## total within between
## fixed slopes (within) 0.045863872 0.1928090 NA
## fixed slopes (between) 0.002901599 NA 0.003807234
## slope variation (within) 0.121261997 0.5097782 NA
## slope variation (between) 0.000000000 NA 0.000000000
## intercept variation (between) 0.759226336 NA 0.996192766
## residual (within) 0.070746196 0.2974128 NA
##
## $R2s
## total within between
## f1 0.045863872 0.1928090 NA
## f2 0.002901599 NA 0.003807234
## v1 0.121261997 0.5097782 NA
## v2 0.000000000 NA 0.000000000
## m 0.759226336 NA 0.996192766
## f 0.048765471 NA NA
## fv 0.170027468 0.7025872 0.003807234
## fvm 0.929253804 NA NA
Finally, the model with unequal variances and AR1 is calculated.
out5g <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat5, method="ML", na.action=na.omit, weights=varIdent(form = ~1|timec), correlation=corARMA(form = ~ 1 + timec | pid, p = 1))
sumout5g <- summary(out5g)
Then, the calculation is quite complicated. The residual variance in the first time point is assumed to be equal to the variance of the white noise. Then, the residual variances of the following time points are calculated based on the impact of the AR1 function. After that, the error variances across time points are averaged and put in the formula.
ar1 <- coef(out5g$modelStruct$corStruct, uncons = FALSE)
sdratio <- coef(out5g$modelStruct$varStruct, uncons = FALSE)
out5gsigmatime <- out5g$sigma * sdratio
out5gerrvar <- c(out5g$sigma, out5gsigmatime)^2
errvar1g <- out5gerrvar[1] / (1 - ar1^2)
errvar2g <- (ar1^2)*errvar1g + out5gerrvar[2]
errvar3g <- (ar1^2)*errvar2g + out5gerrvar[3]
errvar4g <- (ar1^2)*errvar3g + out5gerrvar[4]
out5gerrvar <- cbind(1:4, c(errvar1g, errvar2g, errvar3g, errvar4g))
matchindex5 <- match(dat5$time, out5gerrvar[,1])
dat5$varg <- out5gerrvar[matchindex5, 2]
aveerrvar5g <- mean(dat5$varg)
r2mlm_long_manual(data=dat5, covs=c("timec", "diffstress", "avestress"), random_covs=c("timec"), gammas=coef(sumout5g)[-1, "Value"], clusterID="pid", Tau=getVarCov(out5g), sigma2=aveerrvar5g, bargraph=FALSE)
## $Decompositions
## total within between
## fixed slopes (within) 0.045780213 0.1755998 NA
## fixed slopes (between) 0.003175372 NA 0.004295151
## slope variation (within) 0.116899033 0.4483911 NA
## slope variation (between) 0.000000000 NA 0.000000000
## intercept variation (between) 0.736116906 NA 0.995704849
## residual (within) 0.098028476 0.3760091 NA
##
## $R2s
## total within between
## f1 0.045780213 0.1755998 NA
## f2 0.003175372 NA 0.004295151
## v1 0.116899033 0.4483911 NA
## v2 0.000000000 NA 0.000000000
## m 0.736116906 NA 0.995704849
## f 0.048955585 NA NA
## fv 0.165854618 0.6239909 0.004295151
## fvm 0.901971524 NA NA