---
title: "Intro to Multilevel Model: Lecture 12 Example Code"
author: "Sunthud Pornprasertmanit"
output: html_document
date: "`r format(Sys.time(), '%d/%m/%Y')`"
---

```{css style settings, echo = FALSE}
blockquote {
    padding: 10px 20px;
    margin: 0 0 20px;
    font-size: 14px;
    border-left: 5px solid #eee;
    font-family: "Courier New";
}
```

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r xx, include=FALSE}
library(lme4)
library(nlme)
library(psych)
library(interactions)
library(lavaan)
library(MASS)
out <- readr::read_rds("lecture12saveresults.rds")
ss11 <- out[[1]]
ss12 <- out[[2]]
ss13 <- out[[3]]
treatcval <- c(0, 1) - 0.5
```

## Spline Model

##### Example 1: Stress Treatment Effects 

Read the new data set representing a randomized-control experiment into treatment and control groups. Stress are measured weekly for 14 weeks. 

```{r readdata1}
dat1 <- read.table("lecture12ex1.csv", sep=",", header=TRUE)
```

See descriptive statistics of all variables within the data set.

```{r describe1}
library(psych)
describe(dat1)
```

In this example, the time will be separated into three periods: baseline, treatment, and follow-up. The treatment starts right after measuring Week 5. The treatments are implemented at Week 5, 6, 7, 8, 9, and 10. The measurements at Week 6, 7, 8, 9, 10, and 11 are the direct impacts of the treatment so these measurements are included in the treatment period. Thus, Week 12, 13, and 14 are the pure follow-up period. 

```{r echo=FALSE, results = 'asis'}
library(knitr)
dd <- data.frame(Time = 1:14, 'Centered-Time' = 0:13, 'Baseline' = c(0:4, rep(4, 9)), 'Treatment' = c(rep(0, 4), 0:6, rep(6, 3)), 'Follow-up'=c(rep(0, 11), 1:3))
kable(dd, caption="Separate Time Variables into Three Variables")
```

```{r timetransformation1}
dat1$timec <- dat1$time - 1
dat1$timebase <- dat1$timec
dat1[dat1$timebase > 4, "timebase"] <- 4
dat1$timetreat <- dat1$timec - dat1$timebase
dat1[dat1$timetreat > 6, "timetreat"] <- 6
dat1$timefollow <- dat1$timec - dat1$timebase - dat1$timetreat
```

Check the null model to investigate the intraclass correlation of stress.

```{r nullmodel1}
library(lme4)
out1a <- lmer(stress ~ 1 + (1|pid), data=dat1, REML=FALSE)
summary(out1a)
```

Use `timec` to predict job satisfaction without random slopes.

```{r fixedtime1}
out1b <- lmer(stress ~ 1 + timec + (1|pid), data=dat1, REML=FALSE)
summary(out1b)
```

Let's use three time variables, `timebase`, `timetreat`, and `timefollow`, to model the changes in stress.

```{r fixedtime12}
out1c <- lmer(stress ~ 1 + timebase + timetreat + timefollow + (1|pid), data=dat1, REML=FALSE)
anova(out1b, out1c)
```

The model with three time variables was significantly better than the model with only one time variable. Check the output of the model with three time variables.

```{r fixedtime13}
summary(out1c)
```

Let's investigate the change in stress across times. First, make each line represent the raw stress level of each individual across time.

```{r changeplot1}
library(ggplot2)
dat1_2 <- dat1[dat1$pid%%20 == 0,]
dat1_2$treat <- factor(dat1_2$treat, labels=c("Control", "Treatment"))
ggplot(data=dat1_2, aes(x=time, y=stress, group=pid, colour=treat)) + geom_line()
```

Second, in each individual, all three time variables are used to predict stress levels. Then, the predicted values of stress are saved as `stress2`. Then, the predicted stress force a linear change within individuals. See the spline change of stress level in each client. 

```{r changeplot2}
datl <- split(dat1, dat1$pid)
stresspred <- lapply(datl, function(x) { predict(lm(stress ~ timebase + timetreat + timefollow, data=x)) })
dat1$stress2 <- do.call(c, stresspred)
dat1_2 <- dat1[dat1$pid%%20 == 0,]
dat1_2$treat <- factor(dat1_2$treat, labels=c("Control", "Treatment"))
ggplot(data=dat1_2, aes(x=time, y=stress2, group=pid, colour=treat)) + geom_line()
```

Next, test the random slopes of each time variable. The model with the random slope of `timebase` was not convergent and the fixed effects of `timebase` was not significant too. Because `timebase` is not the crucial effect of the research, `timebase` is dropped from further analyses.

```{r randomslope1}
out1d <- lmer(stress ~ 1 + timebase + timetreat + timefollow + (1 + timebase|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(out1c, out1d)
```

Next, the random slopes of `timetreat` was significant.

```{r randomslope12}
out1e <- lmer(stress ~ 1 + timebase + timetreat + timefollow + (1 + timetreat|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(out1c, out1e)
```

Next, the random slopes of `timefollow` was significant.

```{r randomslope13}
out1f <- lmer(stress ~ 1 + timebase + timetreat + timefollow + (1 + timefollow|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
anova(out1c, out1f)
```

In sum, two random effects were significant. Thus, both random effects are analyzed in the same model and remove `timebase` from the model.

```{r randomslope14}
out1g <- lmer(stress ~ 1 + timetreat + timefollow + (1 + timetreat+ timefollow|pid),  data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(out1g)
```

Let's plot the scatterplot matrix of the random effects. The easiest way is to generate data from the output of the random slope models. See the `Random effects:` section. The standard deviations and the correlation matrix of the random effects are saved. Next, see the means of each random effect in the `Fixed effects:` section. From means, standard deviations, and the correlation matrix, data are simulated by the `mvrnorm` function in the `MASS` package. Then, the simulated data are used to create scatterplot matrix using the `pairs.panels` in the `psych` package. 

```{r randomeffectplot1}
tempsd <- c(0.5799, 1.1468, 1.3455)
tempr <- matrix(c(1, 0.2, -0.67,
           0.2, 1, -0.31,
           -0.67, -0.31, 1), 3, 3)
library(lavaan)
tempcov <- cor2cov(tempr, tempsd)
library(MASS)
temp <- mvrnorm(1000, c(52.41, -0.88, 0.587), tempcov)
colnames(temp) <- c("intcept", "timetreat", "timefollow")
library(psych)
pairs.panels(temp, main = "Scatter Plot Matrix of the Random Effects") 
```

Next, the scatterplots of each pair are created.

```{r randomeffectplot2}
plot(temp[,1], temp[,2], xlab="Baseline Stress", ylab="Stress Change in Treatment Period")
abline(h = 0, col="red", lty=2)
abline(v = 0, col="red", lty=2)
```

```{r randomeffectplot3}
plot(temp[,1], temp[,3], xlab="Baseline Stress", ylab="Stress Change in Follow-up Period")
abline(h = 0, col="red", lty=2)
abline(v = 0, col="red", lty=2)
```

```{r randomeffectplot4}
plot(temp[,2], temp[,3], xlab="Stress Change in Treatment Period", ylab="Stress Change in Follow-up Period")
abline(h = 0, col="red", lty=2)
abline(v = 0, col="red", lty=2)
```

Next, all covariates are added in the model. `sleephrs` is the only time-varying covariate. It is separated into individual means and individual-mean-centered variable.

```{r timevarying1}
dat1$sleephrs <- dat1$sleep / 60
dat1$avesleep <- ave(dat1$sleephrs, dat1$pid)
dat1$diffsleep <- dat1$sleephrs - dat1$avesleep
```

All time-invariant covariates, `avesleep`, `treat`, and `age` are grand-mean centered. Thus, the grand means must be calculated by (a) making a new data set where rows represent each individual and (b) calculating the means across individuals. 

```{r grandmean1}
dat1a <- dat1[!duplicated(dat1$pid),]
round(apply(dat1a, 2, mean), 3)
```

Next, all covariates are included in the models. 

```{r timeinvariant1}
out1h <- lmer(stress ~ 1 + timetreat + timefollow + diffsleep + I(avesleep - 7.075) + I(treat - 0.5) + I(age - 36.638) + (1 + timetreat+ timefollow|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))
summary(out1h)
```

Next, all time-invariate covariates are centered at the calculated grand means. Also, all cross-level interactions are added to predict the random slopes of `time`. 

```{r crosslevel1}
out1i <- lmer(stress ~ 1 + timetreat + timefollow + diffsleep + timetreat*I(avesleep - 7.075) + timetreat*I(treat - 0.5) + timetreat*I(age - 36.638) + timefollow*I(avesleep - 7.075) + timefollow*I(treat - 0.5) + timefollow*I(age - 36.638) + (1 + timetreat+ timefollow|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead", calc.derivs = FALSE))
summary(out1i)
```

Plot the expected changes in the stress level in both treatment and control groups. First, the data from Individual 1 are selected to get the values of `time`, `timetreat`, and `timefollow` at each time point. Next, the regression equation of expect means is obtained from the fixed effects results involving `treat`, `timetreat`, and `timefollow`. Next, the expected means of each treatment condition and each time point are calculated and plotted.

```{r expectedmeanplot1}
dat1g1 <- dat1[dat1$pid == 1,]
w <- 1
ytreat <- 52.42 + 0.06*(w - 0.5) - 0.88*dat1g1$timetreat - 1.864*(w-0.5)*dat1g1$timetreat + 0.59*dat1g1$timefollow + 1.189*(w-0.5)*dat1g1$timefollow
w <- 0
yctrl <- 52.42 + 0.06*(w - 0.5) - 0.88*dat1g1$timetreat - 1.864*(w-0.5)*dat1g1$timetreat + 0.59*dat1g1$timefollow + 1.189*(w-0.5)*dat1g1$timefollow
plot(dat1g1$time, ytreat, type="n", xlab="time", ylab="stress", ylim=c(40,55))
lines(dat1g1$time, ytreat, col="red")
lines(dat1g1$time, yctrl, col="green")
legend(1, 43, c("Treatment", "Control"), lty=1, col=c("red", "green"))
```

`treat` significantly predicted the random slopes of `timetreat` and `timefollow`. Thus, both cross-level interactions are probed by the `interactions` package. Before using the package, all variables must be centered by creating new variables. 

```{r avoidI1}
dat1$avesleepc <- dat1$avesleep - 7.075
dat1$treatc <- dat1$treat - 0.5
dat1$agec <- dat1$age - 36.638
out1in <- lmer(stress ~ 1 + timetreat + timefollow + diffsleep + timetreat*avesleepc + timetreat*treatc + timetreat*agec + timefollow*avesleepc + timefollow*treatc + timefollow*agec + (1 + timetreat+ timefollow|pid), data=dat1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead", calc.derivs = FALSE))
summary(out1in)
```

First, the simple slopes of `timetreat` at each treatment condition is calculated.

```{r simpleslope11, eval=FALSE}
treatcval <- c(0, 1) - 0.5
library(interactions)
ss11 <- sim_slopes(model=out1in, pred=timetreat, modx=treatc, modx.values=treatcval)
```
```{r simpleslope111}
ss11
```

Plot the interaction where X-axis is `timetreat` and each line represents each value of `treat`.

```{r interactionplot11}
interact_plot(model=out1in, pred=timetreat, modx=treatc, modx.values=treatcval)
```

Second, the simple slopes of `timefollow` at each treatment condition is calculated.

```{r simpleslope12, eval=FALSE}
ss12 <- sim_slopes(model=out1in, pred=timefollow, modx=treatc, modx.values=treatcval)
```
```{r simpleslope121}
ss12
```

Plot the interaction where X-axis is `timefollow` and each line represents each value of `treat`.

```{r interactionplot12}
interact_plot(model=out1in, pred=timefollow, modx=treatc, modx.values=treatcval)
```

Third, the expected differences between treatment conditions at each time is calculated. However, the time variables are separated into `timetreat` and `timefollow`. Thus, `timetreat` is treated as the first moderator and `timefollow` is treated as the second moderator. To test the expected differences at all time points, all possible data points of `timetreat` and `timefollow` are included. Note that the time used for this probing interaction is more than one hour.

```{r simpleslope13, eval=FALSE}
timetreatval <- c(0, 1, 2, 3, 4, 5, 6)
timefollowval <- c(0, 1, 2, 3)
ss13 <- sim_slopes(model=out1in, pred=treatc, modx=timetreat, mod2=timefollow, modx.values=timetreatval, mod2.values=timefollowval)
```
```{r simpleslope131}
ss13
```

## Heteroscedastic Models

##### Example 2: Changes in Sense of Belonging in Undergraduate Students

Read the data set from Lecture 11. 

```{r readdata2}
dat2 <- read.table("lecture11ex2.csv", sep=",", header=TRUE)
```

Run the model where time is centered at the first year and stress is separated into Level-1 and Level-2 covariates by group mean centering and predicts the sense of belonging.

```{r lme4used2}
dat2$timec <- dat2$time - 1
dat2$avestress <- ave(dat2$stress, dat2$pid)
dat2$diffstress <- dat2$stress - dat2$avestress
library(lme4)
out2 <- lmer(mem ~ 1 + timec + diffstress + avestress + (1 + timec|pid), data=dat2, REML=FALSE)
summary(out2)
```

The `lme4` package does not have features for heteroscedastic models. Rather the `nlme` package is used instead. Let's run the previous model with `nlme`.

```{r nlmeused2}
library(nlme)
out2a <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat2, method="ML", na.action=na.omit)
summary(out2a)
```

Next, run the models where the residual variances are different across four time points. The `weights` argument sets the variance structure. In this case, the variance is identical at each value of `timec`.

```{r differentvariancestime2}
out2b <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat2, method="ML", na.action=na.omit, weights=varIdent(form = ~1|timec))
summary(out2b)
```

Compare the models with equal and unequal variances across time. The unequal residual variances were significant. 

```{r differentvariancestime22}
anova(out2a, out2b)
```

Let's calculate residual standard deviations at each time point. 

```{r differentvariancestime23}
out2bsigma <- as.numeric(VarCorr(out2b)[3, 2])
out2bsigmatime <- out2bsigma * coef(out2b$modelStruct$varStruct, uncons = FALSE)
c(out2bsigma, out2bsigmatime)
```

Next, run the models where the residual variances have the exponential changes. The `weights` argument is specified that the `time` variable predicts the residual variance by the exponential function. 

```{r expvariances2}
out2c <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat2, method="ML", na.action=na.omit, weights=varExp(form = ~time))
summary(out2c)
```

The exponential change in residual variances is not significantly better than the equal variances model. 

```{r expvariances22}
anova(out2a, out2c)
```

Let's calculate residual standard deviations at each time point. 

```{r expvariances23}
out2csigma <- as.numeric(VarCorr(out2c)[3, 2])
expvalue <- coef(out2c$modelStruct$varStruct, uncons = FALSE)
out2csigma * exp(expvalue * (1:4))
```

Next, run the models where the residual variances is the power function of `time`. 

```{r powvariances2}
out2d <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat2, method="ML", na.action=na.omit, weights=varPower(form = ~time))
summary(out2d)
```

The power change in residual variances is not significantly better than the equal variances model. 

```{r powvariances22}
anova(out2a, out2d)
```

Let's calculate residual standard deviations at each time point. 

```{r powvariances23}
out2dsigma <- as.numeric(VarCorr(out2d)[3, 2])
powervalue <- coef(out2d$modelStruct$varStruct, uncons = FALSE)
out2dsigma * (1:4)^powervalue
```

Next, run the models where the residual variances is the power function of `time` with added constant. 

```{r powconstantvariances2}
out2e <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat2, method="ML", na.action=na.omit, weights=varConstPower(form = ~time), control = lmeControl(maxIter=200, msMaxIter = 200))
summary(out2e)
```

The constant power change in residual variances is significantly better than the equal variances model. 

```{r powconstantvariances22}
anova(out2a, out2e)
```

Let's calculate residual standard deviations at each time point. 

```{r powconstantvariances23}
out2esigma <- as.numeric(VarCorr(out2e)[3, 2])
paramvar <- coef(out2e$modelStruct$varStruct, uncons = FALSE)
out2esigma * (paramvar[1] + (1:4)^paramvar[2])
```

The model with different variances across times and the model where the variances are predicted by the constant power function are compared. The latter model is the restricted model. The test was significant so the model with different variances across times is selected. 

```{r comparevariancemodel2}
anova(out2e, out2b)
```

##### Example 3: Interviewee's Ratings based on the Interaction between Interviewer's and Interviewee's Sex

Read the data set from Lecture 4. 

```{r readdata3}
dat3 <- read.table("lecture4ex2.csv", sep=",", header=TRUE)
```

Run the model predicting interviewee's ratings by the interaction between interviewer's and interviewee's sex. In this model, the residual variances are equal across all combinations of interviewers and interviewees.

```{r equalvariances3}
out3a <- lme(score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100), random = ~1 + eesex|erid, data=dat3, method="ML", na.action=na.omit)
summary(out3a)
```

Next, estimate different residual variances across interviewee's sex.

```{r intervieweesex3}
out3b <- lme(score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100), random = ~1 + eesex|erid, data=dat3, method="ML", na.action=na.omit, weights=varIdent(form = ~1|eesex))
summary(out3b)
```

The likelihood ratio test indicated that residual variances are not significantly different across interviewee's sex.

```{r intervieweesex32}
anova(out3a, out3b)
```

Next, estimate different residual variances across interviewer's sex.

```{r interviewersex3}
out3c <- lme(score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100), random = ~1 + eesex|erid, data=dat3, method="ML", na.action=na.omit, weights=varIdent(form = ~1|ersex))
summary(out3c)
```

The likelihood ratio test indicated that residual variances are not significantly different across interviewer's sex.

```{r interviewersex32}
anova(out3a, out3c)
```

Finally, estimate different residual variances across both interviewee's and interviewer's sex.

```{r bothsex3}
out3d <- lme(score ~ 1 + ersex + eesex + eesex:ersex + I(iq - 100), random = ~1 + eesex|erid, data=dat3, method="ML", na.action=na.omit, weights=varIdent(form = ~1|eesex * ersex))
summary(out3d)
```

The likelihood ratio test indicated that residual variances are not significantly different across interviewer's sex.

```{r bothsex32}
anova(out3a, out3d)
```

## Autocorrelation

##### Example 4: Changes in Sense of Belonging in Undergraduate Students

Read the data set from Lecture 11. 

```{r readdata4}
dat4 <- read.table("lecture11ex2.csv", sep=",", header=TRUE)
```

Again, the `nlme` package is needed. Let's make the model predicting sense of belonging where (a) time is centered at Year 1, (b) stress is separated into individual means and deviations from individual means, and (3) time has random slopes. 

```{r startmodel4}
dat4$timec <- dat4$time - 1
dat4$avestress <- ave(dat4$stress, dat4$pid)
dat4$diffstress <- dat4$stress - dat4$avestress
library(nlme)
out4a <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat4, method="ML", na.action=na.omit)
summary(out4a)
```

The first-order autoregressive model (AR1) is used to predict the error relationship. The `correlation` argument is used. The `corARMA` function is used to indicate that the autoregressive model is used. `p = 1` means to use the first-order model. 

```{r arfirst4}
out4e <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat4, method="ML", na.action=na.omit, correlation=corARMA(form = ~ 1 + timec | pid, p = 1))
summary(out4e)
```

The AR1 is significant. 

```{r arfirst42}
anova(out4a, out4e)
```

Plot the error structure. 

```{r errorstructure4}
set.seed(123321)
ar1 <- coef(out4e$modelStruct$corStruct, uncons = FALSE)
out4esigma <- as.numeric(VarCorr(out4e)["Residual", "StdDev"])
e1 <- rnorm(20, 0, out4esigma/sqrt(1 - ar1^2)) 
e2 <- ar1*e1 + rnorm(20, 0, out4esigma)
e3 <- ar1*e2 + rnorm(20, 0, out4esigma)
e4 <- ar1*e3 + rnorm(20, 0, out4esigma)

mye <- c(e1, e2, e3, e4)
myid <- rep(1:20, 4)
myt <- rep(1:4, each=20)
daterror5 <- data.frame(t=myt, y=mye, id=myid)
library(ggplot2)
g <- ggplot(data=daterror5, aes(x=t, y=y, group=id)) 
g + geom_line(aes(color=factor(id)), size=1)
```

The second-order autoregressive model (AR2) is used to predict the error relationship and compared with the AR1 model. `p = 2` means to use the second-order model in the `corARMA` function. 

```{r arsecond4}
out4f <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat4, method="ML", na.action=na.omit, correlation=corARMA(form = ~ 1 + timec | pid, p = 2))
summary(out4f)
```

The AR2 model was not significantly better than the AR1 model. 

```{r arsecond42}
anova(out4e, out4f)
```

Plot the error structure. 

```{r errorstructuresecond4}
set.seed(123321)
ar <- coef(out4f$modelStruct$corStruct, uncons = FALSE)
out4fsigma <- as.numeric(VarCorr(out4f)["Residual", "StdDev"])
out4fsigma <- out4fsigma/sqrt(1 - ar[1]^2 - ar[2]^2 - ((ar[1]^2)*ar[2]/(1 - ar[2])))
e1 <- rnorm(20, 0, out4fsigma)
e2 <- ar[1]*e1 + rnorm(20, 0, out4fsigma)
e3 <- ar[1]*e2 + ar[2]*e1 + rnorm(20, 0, out4fsigma)
e4 <- ar[1]*e3 + ar[2]*e2 + rnorm(20, 0, out4fsigma)
mye <- c(e1, e2, e3, e4)
myid <- rep(1:20, 4)
myt <- rep(1:4, each=20)
daterror6 <- data.frame(t=myt, y=mye, id=myid)
library(ggplot2)
g <- ggplot(data=daterror6, aes(x=t, y=y, group=id)) 
g + geom_line(aes(color=factor(id)), size=1)
```

The AR1 model with different residual variances across time is used. The `weights` argument is added in the AR1 model. 

```{r arfirsthetero4}
out4g <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat4, method="ML", na.action=na.omit, weights=varIdent(form = ~1|timec), correlation=corARMA(form = ~ 1 + timec | pid, p = 1))
summary(out4g)
```

The heteroscedastic AR1 model is compared with the homoscedastic AR1 model. 

```{r arfirsthetero42}
anova(out4e, out4g)
```

Plot the error structure. 

```{r errorstructurefirsthetero4}
set.seed(123321)
ar1 <- coef(out4g$modelStruct$corStruct, uncons = FALSE)
sdratio <- coef(out4g$modelStruct$varStruct, uncons = FALSE)
out4gsigma <- as.numeric(VarCorr(out4g)["Residual", "StdDev"])
e1 <- rnorm(20, 0, out4gsigma/sqrt(1 - ar1^2))
e2 <- ar1*e1 + rnorm(20, 0, out4gsigma * sdratio[1])
e3 <- ar1*e2 + rnorm(20, 0, out4gsigma * sdratio[2])
e4 <- ar1*e3 + rnorm(20, 0, out4gsigma * sdratio[3])
mye <- c(e1, e2, e3, e4)
myid <- rep(1:20, 4)
myt <- rep(1:4, each=20)
daterror5 <- data.frame(t=myt, y=mye, id=myid)
library(ggplot2)
g <- ggplot(data=daterror5, aes(x=t, y=y, group=id)) 
g + geom_line(aes(color=factor(id)), size=1)
```

## R-squared in Growth Curve Model

##### Example 5: Changes in Sense of Belonging in Undergraduate Students


Read the data set from Lecture 11. 

```{r readdata5}
dat5 <- read.table("lecture11ex2.csv", sep=",", header=TRUE)
```

Run the model with equal residual variances and without autocorrelation. 

```{r homonoar5}
dat5$timec <- dat5$time - 1
dat5$avestress <- ave(dat5$stress, dat5$pid)
dat5$diffstress <- dat5$stress - dat5$avestress
library(nlme)
out5a <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat5, method="ML", na.action=na.omit)
sumout5a <- summary(out5a)
```

See that the output from the `summary` function is saved as an object. The output will be used in the  `r2mlm_long_manual` function as mentioned in Lecture 8. $R^2$ can be calculated as follows:

```{r r2homonoar5}
library(r2mlm)
r2mlm_long_manual(data=dat5, covs=c("timec", "diffstress", "avestress"), random_covs=c("timec"), gammas=coef(sumout5a)[-1, "Value"], clusterID="pid", Tau=getVarCov(out5a), sigma2=out5a$sigma^2, bargraph=FALSE)
```

Next, the model with equal residual variances and AR1 is analyzed.

```{r homoar5}
out5e <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat5, method="ML", na.action=na.omit, correlation=corARMA(form = ~ 1 + timec | pid, p = 1))
sumout5e <- summary(out5e)
```

In the AR1 model, the residual variance in the `summary` function cannot be used directly in the sigma2 argument. However, residual variance can be calculated by $\sigma^2/(1 - \rho^2)$

```{r r2homoar5}
r2mlm_long_manual(data=dat5, covs=c("timec", "diffstress", "avestress"), random_covs=c("timec"), gammas=coef(sumout5e)[-1, "Value"], clusterID="pid", Tau=getVarCov(out5e), sigma2=out5e$sigma^2/(1 - ar1^2), bargraph=FALSE)
```

Next, the model with unequal residual variances and without autocorrelation is analyzed.

```{r heteronoar5}
out5b <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat5, method="ML", na.action=na.omit, weights=varIdent(form = ~1|timec))
sumout5b <- summary(out5b)
```

In the heteroscedastic model, the residual variances are unequal across time. Thus, residual variances are averaged across all data points. Then, put the average in the formula.

```{r r2heteronoar5}
out5bsigma <- out5b$sigma
out5bsigmatime <- out5bsigma * coef(out5b$modelStruct$varStruct, uncons = FALSE)
out5berrvar <- c(out5bsigma, out5bsigmatime)^2
out5berrvar <- cbind(1:4, out5berrvar)
matchindex5 <- match(dat5$time, out5berrvar[,1])
dat5$vare <- out5berrvar[matchindex5, 2]
aveerrvar5b <- mean(dat5$vare)
r2mlm_long_manual(data=dat5, covs=c("timec", "diffstress", "avestress"), random_covs=c("timec"), gammas=coef(sumout5b)[-1, "Value"], clusterID="pid", Tau=getVarCov(out5b), sigma2=aveerrvar5b, bargraph=FALSE)
```

Finally, the model with unequal variances and AR1 is calculated.

```{r heteroar5}
out5g <- lme(mem ~ 1 + timec + diffstress + avestress, random = ~1 + timec|pid, data=dat5, method="ML", na.action=na.omit, weights=varIdent(form = ~1|timec), correlation=corARMA(form = ~ 1 + timec | pid, p = 1))
sumout5g <- summary(out5g)
```

Then, the calculation is quite complicated. The residual variance in the first time point is assumed to be equal to the variance of the white noise. Then, the residual variances of the following time points are calculated based on the impact of the AR1 function. After that, the error variances across time points are averaged and put in the formula.

```{r r2heteroar5}
ar1 <- coef(out5g$modelStruct$corStruct, uncons = FALSE)
sdratio <- coef(out5g$modelStruct$varStruct, uncons = FALSE)
out5gsigmatime <- out5g$sigma * sdratio
out5gerrvar <- c(out5g$sigma, out5gsigmatime)^2
errvar1g <- out5gerrvar[1] / (1 - ar1^2)
errvar2g <- (ar1^2)*errvar1g + out5gerrvar[2]
errvar3g <- (ar1^2)*errvar2g + out5gerrvar[3]
errvar4g <- (ar1^2)*errvar3g + out5gerrvar[4]
out5gerrvar <- cbind(1:4, c(errvar1g, errvar2g, errvar3g, errvar4g))
matchindex5 <- match(dat5$time, out5gerrvar[,1])
dat5$varg <- out5gerrvar[matchindex5, 2]
aveerrvar5g <- mean(dat5$varg)
r2mlm_long_manual(data=dat5, covs=c("timec", "diffstress", "avestress"), random_covs=c("timec"), gammas=coef(sumout5g)[-1, "Value"], clusterID="pid", Tau=getVarCov(out5g), sigma2=aveerrvar5g, bargraph=FALSE)
```