---
title: "Intro to Multilevel Model: Lecture 15 Example Code"
author: "Sunthud Pornprasertmanit"
output: html_document
date: "`r format(Sys.time(), '%d/%m/%Y')`"
---

```{css style settings, echo = FALSE}
blockquote {
    padding: 10px 20px;
    margin: 0 0 20px;
    font-size: 14px;
    border-left: 5px solid #eee;
    font-family: "Courier New";
}
```

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r xx, include=FALSE}
library(lme4)
library(psych)
library(nlme)
library(r2mlm)
library(broom.mixed)
library(lavaan)
library(multcomp)
```

## Intraclass Correlation

##### Example 1: The Impact of Couples' Neuroticism on Distress

Read the data set investigating male-female couples. Males' distress and females' distress are predicted by males' neuroticism, females' neuroticism, and length of relationship.   

```{r readdata1}
dat1 <- read.table("lecture15ex1.csv", sep=",", header=TRUE)
```

Check the descriptive statistics of the variables inside the data set.

```{r describe1}
library(psych)
describe(dat1)
```

Intraclass correlations (ICC) of the distinguishable dyads are simply the correlation of the same variable between roles.

```{r missingpattern1}
with(dat1, cor(menneuro, womenneuro))
with(dat1, cor(mendistress, womendistress))
```

##### Example 2: Personality between best friends

Read the data set investigating best-friend dyads. In this case, two people cannot be separated by roles. Extraversion and neuroticism are measured from both people in the dyads.

```{r readdata2}
dat2 <- read.table("lecture15ex2.csv", sep=",", header=TRUE)
```

Check the descriptive statistics of the variables inside the data set.

```{r describe2}
describe(dat2)
```

Use the null multilevel models to investigate ICC in indistinguishable dyads.

```{r regularmlm2}
library(lme4)
out2a <- lmer(extro ~ 1 + (1|coupleid), data=dat2, REML=FALSE)
summary(out2a) 
out2b <- lmer(neuro ~ 1 + (1|coupleid), data=dat2, REML=FALSE)
summary(out2b) 
```

However, when ICC is negative, the regular MLM is inappropriate because ICC is measured via variance. The lower bound is 0. Instead, use the `gls` function in the `nlme` package. The `gls` function is similar to the `lm` function but it allows correlated residuals. To model the indistinguishable dyads, people from the same dyads are allowed to have correlated residuals. The correlation value is ICC, which can be negative.

```{r gls2}
library(nlme)
out2c <- gls(extro ~ 1, data = dat2, correlation = corCompSymm(form = ~1|coupleid))
summary(out2c)
out2d <- gls(neuro ~ 1, data = dat2, correlation = corCompSymm(form = ~1|coupleid))
summary(out2d)
```

Another method is to change the long data set to pairwise data set. Then, correlate the variable between actor and partner. To make the pairwise data set, first, make two copies of the data. The first copy has the increasing order of the dyad ID and the increasing order of the subject ID. The second copy has the increasing order of the dyad ID but the decreasing order of the subject ID.

```{r pairwise2}
dat2a <- dat2[order(dat2$coupleid, dat2$subjectid),]
head(dat2a)
dat2b <- dat2[order(dat2$coupleid, -dat2$subjectid),]
head(dat2b)
```

Then, the level-1 variables in the second copy are attached to the first copy with different names. 

```{r pairwise22}
dat2a$extropartner <- dat2b$extro
dat2a$neuropartner <- dat2b$neuro
head(dat2a)
```

`dat2a` is currently in the pairwise format. Find the correlation between the same variables to get the ICC.

```{r pairwise23}
with(dat2a, cor(extro, extropartner))
with(dat2a, cor(neuro, neuropartner))
```

## Test of Distinguishability

##### Example 1: The Impact of Couples' Neuroticism on Distress

Let $X$ and $Y$ be level-1 variables and $A$ and $B$ be different roles. Test of distinguishability is the combination of the following tests.

- $H_0:\mu_{X_A}=\mu_{X_B}$
- $H_0:\mu_{Y_A}=\mu_{Y_B}$
- $H_0:\sigma^2_{X_A}=\sigma^2_{X_B}$
- $H_0:\sigma^2_{Y_A}=\sigma^2_{Y_B}$
- $H_0:\rho_{X_A,Y_A}=\rho_{X_B,Y_B}$
- $H_0:\rho_{X_A,Y_B}=\rho_{X_B,Y_A}$

If any tests in this set is rejected, the variable used to separate the roles in distinguishable dyads is good. If all null hypotheses are retained, the role variable can be dropped and these dyads can be treated as indistinguishable dyads.

The `lavaan` package is used to test the distinguishability. In `lavaan` script, all parameters are listed including $\mu_{X_A},\mu_{X_B},\mu_{Y_A},\mu_{Y_B},\sigma^2_{X_A},\sigma^2_{X_B},\sigma^2_{Y_A},\sigma^2_{Y_B},\rho_{X_A,Y_A},\rho_{X_B,Y_B},\rho_{X_A,Y_B},\rho_{X_B,Y_A}$. Let's show the example in the men and women distress data. Four variables are involved in the analysis: `mendistress`, `womendistress`, `menneuro`, and `womenneuro`.

First, the means of each variable are listed in the script.

```{r unequalmean1}
scriptmean <- '
mendistress ~ 1
womendistress ~ 1
menneuro ~ 1
womenneuro ~ 1
'
```

Next, the variances of each variable are listed in the script.

```{r unequalvariance1}
scriptvariance <- '
mendistress ~~ mendistress
womendistress ~~ womendistress
menneuro ~~ menneuro
womenneuro ~~ womenneuro
'
```

Next, the tested covariances of each variable are listed in the script. Note that, if variances are equal, the equality in covariances is the same as the equality in correlation.

```{r unequalcovariance1}
scriptcovariance <- '
mendistress ~~ menneuro 
womendistress ~~ womenneuro
mendistress ~~ womenneuro
womendistress ~~ menneuro
'
```

Only four covariances have been listed. The other two covariances, $\rho_{X_A,X_B},\rho_{Y_A,Y_B}$ are attached in the script. These two covariances can be standardized which represent intraclass correlations in distinguishable dyads. Thus, if the test of distinguishability is not rejected, these two covariances cannot be interpreted as ICC.

```{r unequalextracov1}
scriptextra <- '
mendistress ~~ womendistress 
menneuro ~~ womenneuro
'
```

All four scripts are attached into the same script.

```{r unequal1}
scriptparent <- paste(scriptmean, scriptvariance, scriptcovariance, scriptextra)
```

Then, use the `sem` function to run the script on the data. Users may use the `summary` function to check the output now (we will show it later). The output is not shown here to save space. You will see that all means, variances, and covariances are not equal. This is the unconstrained (or parent) model.

```{r unequalrun1}
library(lavaan)
outparent <- sem(scriptparent, data=dat1)
```

Next, make the script of the constrained model where means, variances, and covariances have some equality constraints. In `lavaan`, users can name each parameter called labels. If any parameters have the same labels, it means that equality constraints are applied.

First, the means of each variable are listed and constrained in the script.

```{r equalmean1}
scriptmean2 <- '
mendistress ~ m1*1
womendistress ~ m1*1
menneuro ~ m2*1
womenneuro ~ m2*1
'
```

The means of `mendistress` and `womendistress` are equally constrained by putting the same label as `m1`. The means of `menneuro` and `womenneuro` are equally constrained by putting the same label as `m2`. Next, the variances of each variable are listed and constrained in the script.

```{r equalvariance1}
scriptvariance2 <- '
mendistress ~~ v1*mendistress
womendistress ~~ v1*womendistress
menneuro ~~ v2*menneuro
womenneuro ~~ v2*womenneuro
'
```

Next, the tested covariances of each variable are listed and constrained in the script. 

```{r equalcovariance1}
scriptcovariance2 <- '
mendistress ~~ same*menneuro 
womendistress ~~ same*womenneuro
mendistress ~~ diff*womenneuro
womendistress ~~ diff*menneuro 
'
```

Finally, two additional covariances are attached with labels but without equality constraints. 

```{r equalextracov1}
scriptextra2 <- '
mendistress ~~ dis*womendistress 
menneuro ~~ neuro*womenneuro
'
```

All four scripts are attached into the same script.

```{r equal1}
scriptnested <- paste(scriptmean2, scriptvariance2, scriptcovariance2, scriptextra2)
```

Then, use the `sem` function to run the script on the data. Users may use the `summary` function to check the output now. The output is not shown here to save space. You will see that the equality constraints are indeed applied. This is the constrained (or nested) model.

```{r equalrun1}
outnested <- sem(scriptnested, data=dat1)
```

Run the likelihood ratio test. The test was significant so the test of distinguishability is rejected. The role of men and women are valid in distinguish types of people in the dyads.

```{r anova1}
anova(outnested, outparent)
```

See the output of the parent model. The standardized columns show the standardized parameters, which you can see the correlation between variables.

```{r summaryparent1}
summary(outparent, standardize=TRUE)
```

## Actor-Partner Interdependence Model (APIM)

##### Example 1: The Impact of Couples' Neuroticism on Distress

For distinguishable dyads, path analysis is used to run APIM. Again, the `lavaan` package is used. In this example, psychological distress is predicted by both male's and female's neuroticism and length of relationships. The actor effect is `mendistress ~ menneuro` and `womendistress ~ womenneuro`. The partner effect is `mendistress ~ womenneuro` and `womendistress ~ menneuro`.

```{r apim1}
library(lavaan)
script1 <- '
mendistress ~ menneuro + womenneuro + lengthrela
womendistress ~ menneuro + womenneuro + lengthrela
mendistress ~~ womendistress
'
out1 <- sem(script1, data=dat1)
summary(out1, standardize=TRUE, rsquare=TRUE)
```

`lavaan` can define additional parameters. Thus, users can add or subtract parameters to test some hypotheses. For example, users may have a hypothesis that the actor effects of men is less than the actor effect of women. Users can make separate labels for the actor effects of men (`am`) and women (`aw`). Then, `diffa` is defined as `am - aw`. If `diffa` is significant and less than 0, the user's hypothesis is supported. See the lecture for the details of all additional parameters shown in the script. 

```{r apim12}
script2 <- '
mendistress ~ am*menneuro + pmw*womenneuro + cm*lengthrela
womendistress ~ pwm*menneuro + aw*womenneuro + cw*lengthrela
mendistress ~~ womendistress

# Define New Parameters
apmen := am - pmw                         # Compare actor and partner on mendistress
apwomen := aw - pwm                       # Compare actor and partner on womendistress
diffa := am - aw                          # Compare actor effects on each role
diffp := pmw - pwm                        # Compare partner effects on each role
diffmenneuro := am - pwm                  # Compare the effects on menneuro on men and women distress
diffwomenneuro := pmw - aw                # Compare the effects on womenneuro on men and women distress
difflengthrela := cm - cw                 # Compare the effects on lengthrela on men and women distress
aveactor := (am + aw) / 2                 # Average actor effects
avepartner := (pmw + pwm) / 2             # Average partner effects
diffactorpartner := aveactor - avepartner # Compare actor and partner effects
'
out2 <- sem(script2, data=dat1)
summary(out2)
```

##### Example 3: The Effect of Roommates' Housework Hours on Satisfaction

For indistinguishable dyads, the `gls` function with a pairwise data set is used to analyze APIM. Let's read the new data set of duo roommates. This data analyze the actor and partner effects of housework each roommate spent time on the roommates' satisfaction. 

```{r readdata3}
dat3 <- read.table("lecture15ex3.csv", sep=",", header=TRUE)
```

Check the descriptive statistics. 

```{r describe3}
describe(dat3)
```

Make the pairwise data set. `house` is the amount of time each person spending on housework. `housepartner` is the amount of time each partner spending on housework.

```{r pairwise3}
dat3a <- dat3[order(dat3$coupleid, dat3$subjectid),]
dat3b <- dat3[order(dat3$coupleid, -dat3$subjectid),]
dat3a$housepartner <- dat3b$house
```

Run the `gls` function to investigate the actor (`house`) and the partner (`housepartner`) effects.

```{r apim3}
library(nlme)
out3a <- gls(sat ~ 1 + house + housepartner,
            data = dat3a, correlation = corCompSymm(form = ~1|coupleid))
summary(out3a)
```

Let's say that analysts want to check whether the actor and partner effects are different. In this case, the difference between the regression coefficients of actor effect and partner effect is tested. Thus, `house - housepartner` is needed. The contrast coeficients for `(Intercept)`, `house`, and `housepartner` would be $\begin{bmatrix}0&1&-1\end{bmatrix}$. The `multcomp` package can be used to test the contrast.

```{r contrast3}
library(multcomp)
diffhouse <- matrix(c(0, 1, -1), nrow=1, ncol=3)
ct <- glht(out3a, linfct = diffhouse)
summary(ct)
```

Let's make another model where the actor and partner effects have the interaction predicting satisfaction.

```{r apim32}
out3b <- gls(sat ~ 1 + house*housepartner,
             data = dat3a, correlation = corCompSymm(form = ~1|coupleid))
summary(out3b)
```

The interaction was significant so predictor values for probing interaction are needed. Check the mean and standard deviation of the hours spent on housework. The values of 2.25, 3.25, and 4.25 are used for probing interactions.

```{r probevalues3}
mean(dat3$house)
sd(dat3$house)
housevalue <- c(2.25, 3.25, 4.25)
```

Unfortunately, the `interactions` package does not support the `gls` function. Thus, the function I wrote to analyze simple slopes (mentioned in Lecture 13) is used here. Let's check the effect of `house` at each level of `housepartner`.

```{r simslopes3}
source("quicksimslopeslme4.R")
quick_sim_slopes(out3b, pred="house", modx="housepartner", modx.values=housevalue)
```

The resulting coefficients of simple intercepts and simple slopes are used to plot the interaction with the help of the `abline` function.

```{r plotslopes3}
cond_house <- quick_sim_slopes(out3b, pred="house", modx="housepartner", modx.values=housevalue)
cond_house_int <- cond_house[[1]][,"est"]
cond_house_slope <- cond_house[[2]][,"est"]
plot(dat3a$house, dat3a$sat, type="n", xlab="Self Housework Hours per Month", 
     ylab="Relationship Satisfaction", main="Actor Effect")
abline(a = cond_house_int[1], b = cond_house_slope[1], col="red")
abline(a = cond_house_int[2], b = cond_house_slope[2], col="green")
abline(a = cond_house_int[3], b = cond_house_slope[3], col="blue")
legend(0.1, 40, c("Low (2.25)", "Medium (3.25)", "High (4.25)"), 
       col=c("red", "green", "blue"), lty=1, title="Partner Housework Hours per Month")
```

Alternatively let's check the effect of `housepartner` at each level of `house`.

```{r simslopes32}
quick_sim_slopes(out3b, pred="housepartner", modx="house", modx.values=housevalue)
```

Plot the effect of `housepartner` at each level of `house`.

```{r plotslopes32}
cond_housep <- quick_sim_slopes(out3b, pred="housepartner", modx="house", modx.values=housevalue)
cond_housep_int <- cond_housep[[1]][,"est"]
cond_housep_slope <- cond_housep[[2]][,"est"]
plot(dat3a$house, dat3a$sat, type="n", xlab="Partner Housework Hours per Month", 
     ylab="Relationship Satisfaction", main="Partner Effect")
abline(a = cond_housep_int[1], b = cond_housep_slope[1], col="red")
abline(a = cond_housep_int[2], b = cond_housep_slope[2], col="green")
abline(a = cond_housep_int[3], b = cond_housep_slope[3], col="blue")
legend(0.1, 90, c("Low (2.25)", "Medium (3.25)", "High (4.25)"), 
       col=c("red", "green", "blue"), lty=1, title="Self Housework Hours per Month")
```

## Biases in Actor Effect Estimation

From the `house` and `housepartner` effect without interaction, the regression coefficient of house is -2.20. First, pretend that dyadic data is not available. Researchers can collect only one person in each room. In this case, the data will not have the nested data structure. The `lm` function is used.

```{r bias3}
dat3c <- dat3[!duplicated(dat3$coupleid),]
summary(lm(sat ~ house, data=dat3c))
```

When the dyadic data is not available, the regression coefficient represents the actor effect without controlling the partner effect, which is -4.302. This value is quite different from APIM.

Assume that the dyadic data is available. Now, use only house to predict the dependent variable without `housepartner`. In this case, the regression coefficient will also represents the actor effect but this model assumes that the partner effect is 0. You will see that the actor effect from the model is also quite different from APIM, which is -7.59.

```{r bias32}
out3bias <- gls(sat ~ 1 + house,
             data = dat3a, correlation = corCompSymm(form = ~1|coupleid))
summary(out3bias)
```

# R-squared

In distinguishable dyads, users can use the `sem` function in `lavaan` and add `rsquare=TRUE` when they call the `summary` function. The proportion of variance explained in the dependent variables of each role can be seen. 

In indistinguishable dyads, the $R^2$ is not automatically provided. There are two ways to get the $R^2$. First, users can use the the `predict` function to get the predicted values. Then the squared correlation between the predicted values and actual values is $R^2$.

```{r rsquared1}
cor(predict(out3b), dat3$sat)^2
```

Alternatively, if ICC is positive, the `lme` or `lmer` can be used. The `r2mlm` function can extract $R^2$ from the output.

```{r rsquared2}
out3d <- lme(sat ~ 1 + house*housepartner, random = ~1 |coupleid, data = dat3a)
library(r2mlm)
r2mlm(out3d)
```