---
title: "Intro to Multilevel Model: Lecture 8 Example Code"
author: "Sunthud Pornprasertmanit"
output: html_document
date: "`r format(Sys.time(), '%d/%m/%Y')`"
---

```{css style settings, echo = FALSE}
blockquote {
    padding: 10px 20px;
    margin: 0 0 20px;
    font-size: 14px;
    border-left: 5px solid #eee;
    font-family: "Courier New";
}
```

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r xx, include=FALSE}
library(lme4)
library(effects)
library(interactions)
library(ggplot2)
library(psych)
library(r2mlm)
# out <- readr::read_rds("lecture7saveresults.rds")
# ss41 <- out[[1]]
# ss42 <- out[[2]]
# ss31 <- out[[3]]
```

## R squared in Group Mean Centering

##### Example 1: The Effect of Math Achievement on Perceived Self-Efficacy

Read the data set from Lecture 7.

```{r readdata1}
dat4 <- read.table("lecture7ex1.csv", sep=",", header=TRUE)
```

Calculate group means of math achievement by the `ave` function and get the group-mean-centered math achievement. The group means of math achievement is centered at 50. The centering at 50 is not implemented by `I()` but by subtracting the original group means and saving it as a new variable called `aveach50`. The reason of making a new variable because `I()` is not suppored in the `r2mlm` package which will be used to calculate $R^2$.

```{r calcgroupmean1}
dat4$aveach <- ave(dat4$ach, dat4$schoolid)
dat4$diffach <- dat4$ach - dat4$aveach
dat4$aveach50 <- dat4$aveach - 50
```

First, let's run the cross-level interaction model where the within-school predictors are the deviation of math achievement within schools, `diffach`, and the cross-level interaction, `diffach:aveach50`. The between-school predictors are the school means of math achievement centered at 50, `aveach50`, and whether a school is private, `private`.

```{r crosslevel1}
library(lme4)
out4m1 <- lmer(efficacy ~ 1 + diffach + aveach50 + private
               + diffach:aveach50
               + (1 + diffach|schoolid), data=dat4, REML=FALSE)
summary(out4m1)
```

Run the `r2mlm` function to get the proportions of variance explained.

```{r r2mlm1}
library(r2mlm)
r2mlm(out4m1)
```

## R squared when a Predictor is not Centered

##### Example 2: Interviewees' Rating Scores within Interviewers

Read the data set from Lecture 4.

```{r readdata2}
dat5 <- read.table("lecture4ex2.csv", sep=",", header=TRUE)
```

First, the cross-level interaction model will be used to predict interviews' ratings. The interviewee-level predictors are interviewee's sex which male is the reference group, eesex, interviewee's IQ centered at 100, and the cross-level interaction between interviewee's and interviewer's sex. The interviewer-level predictor is interviewer's sex. However, the `r2mlm_long_manual` function in the `r2mlm` package is not easy to use as the `r2mlm` function. Users need to provide many arguments and the variable names cannot have `:`, `*`, or `I()`. Thus, the interaction or centering must be calcuated as new variables before running the model. Thus, the centered IQ and the interaction are calculated. 

```{r centering2}
dat5$iqc <- dat5$iq - 100
dat5$int <- dat5$eesex * dat5$ersex
```

Next, analyze the cross-level interaction model.

```{r model2}
out5 <- lmer(score ~ 1 + eesex + iqc + ersex + int 
                + (1 + eesex|erid), data=dat5, REML=FALSE)
summary(out5)
```

Run the `r2mlm_long_manual` function to get the proportions of variance explained. Note that, `covs` is the list of predictors with fixed effects. `random_covs` is the list of predictors that have random slopes. `gammas` is the point estimates of the fixed effect. `Tau` is the covariance matrix of the random effects. `sigma2` is the level-1 residual variance. 

```{r r2mlm2}
sumout5 <- summary(out5)
r2mlm_long_manual(data=dat5,
                  covs=c("eesex", "iqc", "ersex", "int"),
                  random_covs=c("eesex"),
                  gammas=coef(sumout5)[-1, "Estimate"],
                  clusterID="erid",
                  Tau=as.matrix(Matrix::bdiag(VarCorr(out5))),
                  sigma2=getME(out5, "sigma")^2)
```

## Changes in R squared

##### Example 1: The Effect of Math Achievement on Perceived Self-Efficacy

Let's check the increase in R squares when math achievement is added to predict self-efficacy in addition to `private`. First, make the likelihood ratio test to compare the models with and without math achievement.

```{r mathadded1}
out4m1 <- lmer(efficacy ~ 1 + diffach + aveach50 + private
               + diffach:aveach50
               + (1 + diffach|schoolid), data=dat4, REML=FALSE)
out4m2 <- lmer(efficacy ~ 1 + private + (1|schoolid), data=dat4, REML=FALSE)
anova(out4m2, out4m1)
```

Use the `r2mlm_comp` function to check the R squared changes when math achievement is added.

```{r r2mlmcomp1}
r2mlm_comp(out4m2, out4m1)
```

## Standardized Regression Coefficients

##### Example 3: Goldfish Food Consumption within Fish Tanks

Read the data set from Lecture 4.

```{r readdata3}
dat1 <- read.table("lecture4ex1.csv", sep=",", header=TRUE)
```

To find standardized coefficients, all variables need to be standardized before running multilevel models. First, the lower-level variables are easily standardized by the `scale` function.

```{r standardize3}
dat1$zconsume <- scale(dat1$consume)
dat1$zlength <- scale(dat1$length)
```

However, the upper-level variables are not easy to standardized. Each row in a data frame represents the lower-level observation. Thus, each upper-level unit will have duplicated values. The duplicated values must be removed. Then, the means and standard deviations are calculated. Finally, the original scores (with duplicated values) are simply subtracted by the calculated means and divided by the obtained standard deviations. 

```{r standardize32}
volume <- dat1[!duplicated(dat1$groupid), "volume"]
mvolume <- mean(volume)
sdvolume <- sd(volume)
dat1$zvolume <- (dat1$volume - mvolume) / sd(volume)
```

Use the standard scores to find group means and group-mean-centered standard scores. 

```{r centerstandard3}
dat1$avezlength <- ave(dat1$zlength, dat1$groupid)
dat1$diffzlength <- dat1$zlength - dat1$avezlength
```

Run the multilevel models with standard scores. 

```{r standardreg3}
out1z23 <- lmer(zconsume ~ 1 + diffzlength + avezlength + zvolume 
                + diffzlength:zvolume + (1 + diffzlength|groupid), 
                data=dat1, REML=FALSE)
summary(out1z23)
```

To check whether the models with standardized and unstandardized variables are the same, log likelihood can be calculated. These models are compared:

1. Standardized outcomes and standardized predictors (calculated above as `out1z23`)
2. Standardized outcomes and unstandardized predictors (`out1bz23`)
3. Unstandardized outcomes and standardized predictors (`out1zb23`)
4. Unstandardized outcomes and unstandardized predictors (`out1b23`)

As shown below, Model 1 and Model 2, which both have standardized outcomes, have the same log likelihood values. Model 3 and Model 4, which both have unstandardized outcomes, have the same log likelihood values. 

```{r testsamemodels3}
dat1$lengthx <- dat1$length/10
dat1$avelengthx <- ave(dat1$lengthx, dat1$groupid)
dat1$difflengthx <- dat1$lengthx - dat1$avelengthx
dat1$avegold <- ave(dat1$goldcolor, dat1$groupid)
dat1$diffgold <- dat1$goldcolor - dat1$avegold
out1zb23 <- lmer(consume ~ 1 + diffzlength + avezlength + zvolume 
                 + diffzlength:zvolume + (1 + diffzlength|groupid), 
                 data=dat1, REML=FALSE)
out1bz23 <- lmer(zconsume ~ 1 + difflengthx + avelengthx + volume 
                 + difflengthx:volume + (1 + difflengthx|groupid), 
                 data=dat1, REML=FALSE)
out1b23 <- lmer(consume ~ 1 + difflengthx + avelengthx + volume 
                + difflengthx:volume + (1 + difflengthx|groupid), 
                data=dat1, REML=FALSE)
logLik(out1z23)
logLik(out1bz23)
logLik(out1zb23)
logLik(out1b23)
```

Instead of standardized regression coefficients, `r2mlm_comp` can be used to find unique contributions of each predictor. This concept is the same as part regression in regular multiple regression coefficients. 

```{r partregression3}
out1b23_1 <- lmer(consume ~ 1 + avelengthx + volume 
                  + (1|groupid), 
                  data=dat1, REML=FALSE)
out1b23_2 <- lmer(consume ~ 1 + difflengthx + volume 
                  + difflengthx:volume + (1 + difflengthx|groupid), 
                  data=dat1, REML=FALSE)
out1b23_3 <- lmer(consume ~ 1 + difflengthx + avelengthx 
                  + difflengthx + (1 + difflengthx|groupid), 
                  data=dat1, REML=FALSE)
r2mlm_comp(out1b23_1, out1b23, bargraph = FALSE)
r2mlm_comp(out1b23_2, out1b23, bargraph = FALSE)
r2mlm_comp(out1b23_3, out1b23, bargraph = FALSE)
```