---
title: "Intro to Multilevel Model: Lecture 11 Example Code"
author: "Sunthud Pornprasertmanit"
output: html_document
date: "`r format(Sys.time(), '%d/%m/%Y')`"
---

```{css style settings, echo = FALSE}
blockquote {
    padding: 10px 20px;
    margin: 0 0 20px;
    font-size: 14px;
    border-left: 5px solid #eee;
    font-family: "Courier New";
}
```

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r xx, include=FALSE}
library(lme4)
library(psych)
library(interactions)
out <- readr::read_rds("lecture11saveresults.rds")
ss11 <- out[[1]]
ss12 <- out[[2]]
ss13 <- out[[3]]
ss14 <- out[[4]]
ss21 <- out[[5]]
ss22 <- out[[6]]
ss23 <- out[[7]]
ss24 <- out[[8]]
ss25 <- out[[9]]
ss26 <- out[[10]]
```

## Growth Curve Model

##### Example 1: Changes in Job Satisfaction

Read the new data set representing a longitudinal study which employees data are collected in four time points. 

```{r readdata1}
dat1 <- read.table("lecture11ex1.csv", sep=",", header=TRUE)
```

See descriptive statistics of all variables within the data set.

```{r describe1}
library(psych)
describe(dat1)
```

Let's run the null model of job satisfaction to check the intraclass correlation.

```{r nullmodel1}
library(lme4)
out1a <- lmer(jsat ~ 1 + (1|pid), data=dat1, REML=FALSE)
summary(out1a)
```

Use `time` to predict job satisfaction without random slopes.

```{r fixedtime1}
out1b <- lmer(jsat ~ 1 + time + (1|pid), data=dat1, REML=FALSE)
summary(out1b)
```

Next, add the random slope of `time` in the model. That is, every employee has different changes in job satisfaction. The likelihood ratio test was significant so the random slope is retained. 

```{r randomslope1}
out1c <- lmer(jsat ~ 1 + time + (1 + time|pid), data=dat1, REML=FALSE)
anova(out1b, out1c)
```

Check the output of the random slope model.

```{r randomslope12}
summary(out1c)
```

Let's investigate individual changes in job satisfaction across three years. The `ggplot2` package is used to group each person data. Then, the regression lines are plotted for each person. Note that only every 10th person will be used in the plot so the plot does not have too many regression lines.

```{r individualplot1}
library(ggplot2)
dat1_2 <- dat1[dat1$pid%%10 == 0,]
ggplot(dat1_2, aes(x=time, y=jsat, group=pid)) + geom_smooth(method=lm, se=FALSE)
```

Next, the X axis is changed to `age` which is the sum of the age at the beginning, `age0`, and the time point, `time`. 

```{r ageplot1}
dat1_2 <- dat1[dat1$pid%%3 == 0,]
dat1_2$age <- dat1_2$age0 + dat1_2$time
ggplot(dat1_2, aes(x=age, y=jsat, group=pid)) + geom_smooth(method=lm, se=FALSE)
```

Finally, the X axis is changed to `tenure` which is the sum of the tenure at the beginning, `tenure0`, and the time point, `time`. 

```{r tenureplot1}
dat1_2 <- dat1[dat1$pid%%5 == 0,]
dat1_2$tenure <- dat1_2$tenure0 + dat1_2$time
ggplot(dat1_2, aes(x=tenure, y=jsat, group=pid)) + geom_smooth(method=lm, se=FALSE)
```

Let's recenter the `time` variable. Previously, `time == 0` represents the beginning of the study. The `time` variable is subtracted by 3 called `timel` so `timel == 0` represents the last time point (at Year 3). Check the result of this centering in the random slope model.

```{r centerlast1}
dat1$timel <- dat1$time - 3
out1c1 <- lmer(jsat ~ 1 + timel + (1 + timel|pid), data=dat1, REML=FALSE)
summary(out1c1)
```

Let's add time-varying covariates in the model. In this example, whether an employee is married is only time-varying covariate. First, this variable is added without centering. 

```{r timevarying1}
out1d <- lmer(jsat ~ 1 + time + married + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1d)
```

As discussed in Lecture 7, if the level-1 variables are added in the model without centering, the level-1 and level-2 effects are equally constrained, which should not be assumed without testing. Thus, `married` is separated into employee means, `avemarried` and within-employee deviations, `diffmarried`. Then, both variables are put in the model in addition to `time`.  

```{r timevarying12}
dat1$avemarried <- ave(dat1$married, dat1$pid)
dat1$diffmarried <- dat1$married - dat1$avemarried
out1e <- lmer(jsat ~ 1 + time + diffmarried + avemarried + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1e)
```

Because `diffmarried` is not significant, it is dropped from the model. Only `avemarried` is retained, which is time invariant. Other time-invariant covariates are added in the model.

```{r timeinvariant1}
out1h <- lmer(jsat ~ 1 + time + avemarried + I(age0-25) + I(tenure0-5) + fftj + officejob + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1h)
```

To increase the interpretability of the regression coefficients, all time-invariate covariates are grand-mean centered. Thus, the grand means must be calculated by (a) making a new data set where rows represent each individual and (b) calculating the means across individuals. 

```{r grandmean1}
dat1a <- dat1[!duplicated(dat1$pid),]
apply(dat1a, 2, mean)
```

Next, all time-invariate covariates are centered at the calculated grand means. Also, all cross-level interactions are added to predict the random slopes of `time`. 

```{r crosslevel1}
out1n <- lmer(jsat ~ 1 + time + time*I(avemarried-0.3725) + time*I(age0-28.375) + time*I(tenure0-2.054) + time*I(fftj-0.4625) + time*I(officejob-0.33) + (1 + time|pid), data=dat1, REML=FALSE)
summary(out1n)
```

`age0` and `tenure0` significantly predicted the random slopes of `time`. Thus, both cross-level interactions are probed by the `interactions` package. Before using the package, all variables must be centered by creating new variables. 

```{r avoidI1}
dat1$age0c <- dat1$age0-28.375
dat1$avemarriedc <- dat1$avemarried - 0.3725
dat1$tenure0c <- dat1$tenure0 - 2.054
dat1$fftjc <- dat1$fftj - 0.4625
dat1$officejobc <- dat1$officejob - 0.33
out1nn <- lmer(jsat ~ 1 + time + time*avemarriedc + time*age0c + time*tenure0c + time*fftjc + time*officejobc + (1 + time|pid), data=dat1, REML=FALSE)
```

Next, find values of `time`, `age0`, and `tenure0` for probing interactions. Don't forget to delete values from the grand means because all predictors are centered by creating new variables.

```{r probeval1}
age0cval <- c(25, 30, 35) - 28.375
tenure0cval <- c(0, 2, 4) - 2.054
timeval <- c(0, 1, 2, 3)
```

First, the simple slopes of `time` at each level of `age0` is calculated.

```{r simpleslope11, eval=FALSE}
library(interactions)
ss11 <- sim_slopes(model=out1nn, pred=time, modx=age0c, modx.values=age0cval)
```
```{r simpleslope111}
ss11
```

Plot the interaction where X-axis is `time` and each line represents each value of `age0`.

```{r interactionplot11}
interact_plot(model=out1nn, pred=time, modx=age0c, modx.values=age0cval)
```

Second, the simple slopes of `age0` at each level of `time` is calculated.

```{r simpleslope12, eval=FALSE}
ss12 <- sim_slopes(model=out1nn, pred=age0c, modx=time, modx.values=timeval)
```
```{r simpleslope121}
ss12
```

Plot the interaction where X-axis is `age0` and each line represents each value of `time`.

```{r interactionplot12}
interact_plot(model=out1nn, pred=age0c, modx=time, modx.values=timeval)
```

Third, the simple slopes of `time` at each level of `tenure0` is calculated.

```{r simpleslope13, eval=FALSE}
ss13 <- sim_slopes(model=out1nn, pred=time, modx=tenure0c, modx.values=tenure0cval)
```
```{r simpleslope131}
ss13
```

Plot the interaction where X-axis is `time` and each line represents each value of `tenure0`.

```{r interactionplot13}
interact_plot(model=out1nn, pred=time, modx=tenure0c, modx.values=tenure0cval)
```

Finally, the simple slopes of `tenure0` at each level of `time` is calculated.

```{r simpleslope14, eval=FALSE}
ss14 <- sim_slopes(model=out1nn, pred=tenure0c, modx=time, modx.values=timeval)
```
```{r simpleslope141}
ss14
```

Plot the interaction where X-axis is `tenure0` and each line represents each value of `time`.

```{r interactionplot14}
interact_plot(model=out1nn, pred=tenure0c, modx=time, modx.values=timeval)
```

##### Example 2: Changes in Senses of Belonging in Undergraduates

Read the new data set representing a longitudinal study which undergraduate students data are collected in four school years. 

```{r readdata2}
dat2 <- read.table("lecture11ex2.csv", sep=",", header=TRUE)
```

See descriptive statistics of all variables within the data set.

```{r describe2}
describe(dat2)
```

Let's run the null model of senses of belonging to check the intraclass correlation.

```{r nullmodel2}
out2a <- lmer(mem ~ 1 + (1|pid), data=dat2, REML=FALSE)
summary(out2a)
```

Use `time` to predict senses of belonging without random slopes. Note that `time` in this example starts with 1 so this variable is subtracted by 1 to make the intercept represent the sense of belonging at freshman year.

```{r fixedtime2}
dat2$timec <- dat2$time - 1
out2b <- lmer(mem ~ 1 + timec + (1|pid), data=dat2, REML=FALSE)
summary(out2b)
```

As another method, calendar year is used as the predictor. Note that this example has two cohorts: B.E. 2550 (2007) (`cohort == 1`) and B.E. 2551 (2008) (`cohort == 0`). If B.E. 2552 (2009) is centered at 0, then four years in the B.E. 2550 cohort should be -2, -1, 0, 1 and those in the B.E. 2551 cohort should be -1, 0, 1, 2. 

```{r fixedtime22}
dat2$calyear <- dat2$time - 3
dat2[dat2$cohort == 0, "calyear"] <- dat2[dat2$cohort == 0, "calyear"] + 1
out2b1 <- lmer(mem ~ 1 + calyear + (1|pid), data=dat2, REML=FALSE)
summary(out2b1)
```

Let's keep using `time`. Next, the curvilinear changes are investigated but the changes were not significant. 

```{r curvilinear2}
out2c <- lmer(mem ~ 1 + timec + I(timec^2) + (1|pid), data=dat2, REML=FALSE)
summary(out2c)
```

Plot the nonsignificant curvilinear changes and see that the curvature is not strong.

```{r curvilinear22}
mytime <- 0:3
mypred <- 62.1311 - 2.4583*mytime + 0.1632*mytime^2
plot(mytime, mypred, type="l")
```

Next, add the random slope of `timec` in the model. That is, every employee has different changes in job satisfaction. The likelihood ratio test was significant so the random slope is retained. 

```{r randomslope2}
out2d <- lmer(mem ~ 1 + timec + (1 + timec|pid), data=dat2, REML=FALSE)
anova(out2b, out2d)
```

Check the output of the random slope model.

```{r randomslope22}
summary(out2d)
```

Let's investigate individual changes in senses of belonging across four years for both cohorts. 

```{r individualplot2}
dat2_2 <- dat2[dat2$pid%%5 == 0,]
dat2_2$cohort <- factor(dat2_2$cohort, labels=c("2551", "2550"))
ggplot(dat2_2, aes(x=time, y=mem, group=pid, colour=cohort)) + geom_smooth(method=lm, se=FALSE, linewidth=0.5)
```

Next, the X axis is changed to `calcyear` which is the calendar year of each data point. 

```{r calcyearplot2}
dat2_2 <- dat2[dat2$pid%%5 == 0,]
dat2_2$cohort <- factor(dat2_2$cohort, labels=c("2551", "2550"))
dat2_2$calyear <- dat2_2$calyear + 2552
ggplot(dat2_2, aes(x=calyear, y=mem, group=pid, colour=cohort)) + geom_smooth(method=lm, se=FALSE, linewidth=0.5)
```

Let's add time-varying covariates in the model. In this example, perceived stress measured in each year is the only time-varying covariate. Let's make the grand means, `avestress`, and group-mean-centered variables, `diffstress`.

```{r timevarying2}
dat2$avestress <- ave(dat2$stress, dat2$pid)
dat2$diffstress <- dat2$stress - dat2$avestress
out2e <- lmer(mem ~ 1 + timec + diffstress + avestress + (1 + timec|pid), data=dat2, REML=FALSE)
summary(out2e)
```

Both stress variables were retained because they were significant. Next, time-invariant covariates are added. To increase the interpretability of the regression coefficients, all time-invariate covariates are grand-mean centered. Thus, the grand means must be calculated by (a) making a new data set where rows represent each individual and (b) calculating the means across individuals. 

```{r grandmean2}
dat2a <- dat2[!duplicated(dat2$pid),]
apply(dat2a, 2, mean)
```

Next, all time-invariate covariates are centered at the calculated grand means and added to the model. 

```{r timeinvariant2}
out2g <- lmer(mem ~ 1 + timec + diffstress + I(avestress - 50.473) + I(cohort - 0.501) + I(female - 0.495) + I(act - 50.219) + I(ext - 49.534) + (1 + timec|pid), data=dat2, REML=FALSE)
summary(out2g)
```

Also, all cross-level interactions are added to predict the random slopes of `timec`. 

```{r crosslevel2}
out2h <- lmer(mem ~ 1 + timec + diffstress + timec*I(avestress - 50.473) + timec*I(cohort - 0.501) + timec*I(female - 0.495) + timec*I(act - 50.219) + timec*I(ext - 49.534) + (1 + timec|pid), data=dat2, REML=FALSE)
summary(out2h)
```

`cohort`, `act`, and `ext` significantly predicted the random slopes of `timec`. Thus, these three cross-level interactions are probed by the `interactions` package. Before using the package, all variables must be centered by creating new variables. 

```{r avoidI2}
dat2$avestressc <- dat2$avestress - 50.473
dat2$cohortc <- dat2$cohort - 0.501
dat2$femalec <- dat2$female - 0.495
dat2$actc <- dat2$act - 50.219
dat2$extc <- dat2$ext - 49.534
out2hn <- lmer(mem ~ 1 + timec + diffstress + timec*avestressc + timec*cohortc + timec*femalec + timec*actc + timec*extc + (1 + timec|pid), data=dat2, REML=FALSE)
```

Next, find values of `time`, `cohort`, `act`, and `ext` for probing interactions. Don't forget to delete values from the grand means because all predictors are centered by creating new variables.

```{r probeval2}
cohortcval <- c(0, 1) - 0.501
actcval <- c(40, 50, 60) - 50.219
extcval <- c(40, 50, 60) - 49.534
timecval <- c(0, 1, 2, 3)
```

First, the simple slopes of `timec` at each level of `cohort` is calculated.

```{r simpleslope21, eval=FALSE}
ss21 <- sim_slopes(model=out2hn, pred=timec, modx=cohortc, modx.values=cohortcval)
```
```{r simpleslope211}
ss21
```

Plot the interaction where X-axis is `timec` and each line represents each value of `cohort`.

```{r interactionplot21}
interact_plot(model=out2hn, pred=timec, modx=cohortc, modx.values=cohortcval)
```

Second, the simple slopes of `cohort` at each level of `timec` is calculated.

```{r simpleslope22, eval=FALSE}
ss22 <- sim_slopes(model=out2hn, pred=cohortc, modx=timec, modx.values=timecval)
```
```{r simpleslope221}
ss22
```

Plot the interaction where X-axis is `cohort` and each line represents each value of `timec`.

```{r interactionplot22}
interact_plot(model=out2hn, pred=cohortc, modx=timec, modx.values=timecval)
```

Third, the simple slopes of `timec` at each level of `act` is calculated.

```{r simpleslope23, eval=FALSE}
ss23 <- sim_slopes(model=out2hn, pred=timec, modx=actc, modx.values=actcval)
```
```{r simpleslope231}
ss23
```

Plot the interaction where X-axis is `timec` and each line represents each value of `act`.

```{r interactionplot23}
interact_plot(model=out2hn, pred=timec, modx=actc, modx.values=actcval)
```

Fourth, the simple slopes of `act` at each level of `timec` is calculated.

```{r simpleslope24, eval=FALSE}
ss24 <- sim_slopes(model=out2hn, pred=actc, modx=timec, modx.values=timecval)
```
```{r simpleslope241}
ss24
```

Plot the interaction where X-axis is `act` and each line represents each value of `timec`.

```{r interactionplot24}
interact_plot(model=out2hn, pred=actc, modx=timec, modx.values=timecval)
```

Fifth, the simple slopes of `timec` at each level of `ext` is calculated.

```{r simpleslope25, eval=FALSE}
ss25 <- sim_slopes(model=out2hn, pred=timec, modx=extc, modx.values=extcval)
```
```{r simpleslope251}
ss25
```

Plot the interaction where X-axis is `timec` and each line represents each value of `ext`.

```{r interactionplot25}
interact_plot(model=out2hn, pred=timec, modx=extc, modx.values=extcval)
```

Finally, the simple slopes of `ext` at each level of `timec` is calculated.

```{r simpleslope26, eval=FALSE}
ss26 <- sim_slopes(model=out2hn, pred=extc, modx=timec, modx.values=timecval)
```
```{r simpleslope261}
ss26
```

Plot the interaction where X-axis is `ext` and each line represents each value of `timec`.

```{r interactionplot26}
interact_plot(model=out2hn, pred=extc, modx=timec, modx.values=timecval)
```