---
title: "Intro to Multilevel Model: Lecture 4 Example Code"
author: "Sunthud Pornprasertmanit"
output: html_document
date: "`r format(Sys.time(), '%d/%m/%Y')`"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Explore multilevel data

Read the data set

```{r explore1}
dat1 <- read.table("lecture4ex1.csv", sep=",", header=TRUE)
```

Find the descriptive statistics of the variables in the data set

```{r explore2}
library(psych)
describe(dat1)
```

Check the group means by the `aggregate` function using `FUN=mean`. To save spaces, only first ten group means are shown.

```{r explore3}
outgroupmean <- aggregate(consume ~ groupid, data=dat1, FUN=mean)
head(outgroupmean, 10)
```

Check the ranges of `consume` within each group by the `aggregate` function using `FUN=mean`.

```{r explore4}
outgrouprange <- aggregate(consume ~ groupid, data=dat1, FUN=range)
head(outgrouprange, 10)
```

`stripchart` is the graph that show data points within each group.

```{r explore5}
stripchart(consume ~ groupid, vertical = TRUE, data = dat1[dat1$groupid < 10,])
```

`beeswarm` is the more beautiful version of `stripchart`.

```{r explore6}
library(beeswarm)
dat1_1 <- dat1[dat1$groupid < 10,]
beeswarm(consume ~ groupid, data=dat1_1, col=rainbow(9))
```

## Analyze Null Models

##### Example 1: Goldfish Food Consumption within Fish Tanks

Two main packages in R are used in multilevel models: `lme4` and `nlme`. Let's use `lme4` package first. The main function for analyzing multilevel regression is `lmer`. The formula is a little more complicated than regular multiple regression. The predictors are `1 + (1|groupid)`. The first term, `1`, means to estimate the intercept where the second term, `(1|groupid)`, means to estimate group means. `RMEL=FALSE` means to use full information maximum likelihood.

```{r null1}
library(lme4)
out1m0 <- lmer(consume ~ 1 + (1|groupid), data=dat1, REML=FALSE)
summary(out1m0)
```

##### Example 2: Interviewees' Rating Scores within Interviewers

Read and examine the data.

```{r null2a}
dat2 <-read.table("lecture4ex2.csv", sep=",", header=TRUE)
describe(dat2)
```

Run the null model.

```{r null2b}
out2m0 <- lmer(score ~ 1 + (1|erid), data=dat2, REML=FALSE)
summary(out2m0)
```

##### Example 3: Customers Satisfaction within Each Table in Dining Services

Read and examine the data.

```{r null3a}
dat3 <-read.table("lecture4ex3.csv", sep=",", header=TRUE)
describe(dat3)
```

Run the null model.

```{r null3b}
out3m0 <- lmer(sat ~ 1 + (1|tableid), data=dat3, REML=FALSE)
summary(out3m0)
```

## Means-as-Outcomes Models

##### Example 1: Goldfish Food Consumption within Fish Tanks

In this goldfish example, two tank-level predictors are used to predict goldfish's' food consumption. `nfish` is the number of fish within each tank and `volume` is the tank volume. Technically, the averaged fish consumption within each tank is predicted by both tank-level predictor. Multilevel regression is used rather than aggregated regression which does not account for the reliability of group means. Note that `1` is not included as the predictor in the formula but it is included automatically as the intercept is shown in the output.

```{r mao1a}
out1m1 <- lmer(consume ~ nfish + volume + (1|groupid), data=dat1, REML=FALSE)
summary(out1m1)
```

Both `nfish` and `volume` can be centered at their grand means to make the intercept of the model meaningful. 

```{r mao1b}
out1m1_1 <- lmer(consume ~ I(nfish - mean(nfish)) + I(volume - mean(volume)) + (1|groupid), data=dat1, REML=FALSE)
summary(out1m1_1)
```

To get the exact p of the t values (assuming normal distribution or df = $\infty$), the t value must be extracted from the output. Thus, the output from the `summary` function is saved.

```{r mao1c}
summary1m1_1 <- summary(out1m1_1)
coef1m1_1 <- coef(summary1m1_1)
coef1m1_1
```

Next, extract only the `t value` column.

```{r mao1d}
tvalue1m1_1 <- coef1m1_1[,"t value"]
tvalue1m1_1
```

Finally, calculate the *p* values using the `pnorm` function. The result is multiplied by 2 to get the two-tailed *p* values

```{r mao1e}
pnorm(abs(tvalue1m1_1), lower.tail=FALSE) * 2
```

##### Example 2: Interviewees' Rating Scores within Interviewers

The rating scores are predicted by the interviewers' sex.

```{r mao2a}
out2m1 <- lmer(score ~ 1 + ersex + (1|erid), data=dat2, REML=FALSE)
summary(out2m1)
```

##### Example 3: Customers Satisfaction within Each Table in Dining Services

The customer satisfaction scores are predicted by the number of persons in each table and whether the table is outdoor.

```{r mao3a}
out3m1 <- lmer(sat ~ 1 + I(numperson - 4) + outdoor + (1|tableid), data=dat3, REML=FALSE)
summary(out3m1)
```

## Random Analysis of Covariance Models

##### Example 1: Goldfish Food Consumption within Fish Tanks

In this goldfish example, two goldfish-level predictors are used to predict goldfish's' food consumption. `length` is the length of each goldfish and `goldcolor` is whether a goldfish is gold or not. Multilevel regression is used rather than disaggregated regression which may inflate type I errors. Note that the length is centered at 15 cm.

```{r rancova1a}
out1m2 <- lmer(consume ~ I(length - 15) + goldcolor + (1|groupid), data=dat1, REML=FALSE)
summary(out1m2)
```

##### Example 2: Interviewees' Rating Scores within Interviewers

The rating scores are predicted by the interviewees' sex and IQ. Note that the interviewee's IQ is centered at 100.

```{r rancova2a}
out2m2 <- lmer(score ~ 1 + eesex + I(iq - 100) + (1|erid), data=dat2, REML=FALSE)
summary(out2m2)
```

##### Example 3: Customers Satisfaction within Each Table in Dining Services

The customer satisfaction scores are predicted by the customers' sex and age. Note that age is centered at 40. 

```{r rancova3a}
out3m2 <- lmer(sat ~ 1 + I(age - 40) + female + (1|tableid), data=dat3, REML=FALSE)
summary(out3m2)
```

## Random Intercepts Models

##### Example 1: Goldfish Food Consumption within Fish Tanks

Put both two goldfish-level predictors and two tank-level predicts in the model. 

```{r ranint1a}
out1m3 <- lmer(consume ~ I(nfish - mean(nfish)) + I(volume - mean(volume)) + I(length - 15) + goldcolor + (1|groupid), data=dat1, REML=FALSE)
summary(out1m3)
```

##### Example 2: Interviewees' Rating Scores within Interviewers

Put interviewers' sex, interviewees' sex, and interviewees' IQ in the same model. 

```{r ranint2a}
out2m3 <- lmer(score ~ 1 + ersex + eesex + I(iq - 100) + (1|erid), data=dat2, REML=FALSE)
summary(out2m3)
```

##### Example 3: Customers Satisfaction within Each Table in Dining Services

In addition to put all customer-level and table-level predictors, the age averages within each table and the proportion of females within each table are also used. Let's start with the age variable. First, the age averages within each group is calculated by the `ave` function. 

```{r ranint3a}
dat3$aveage <- ave(dat3$age, dat3$tableid)
```

Next, both `age` and `aveage` are used to predict customer satisfaction. The regression coefficient of `age` represents the within-table effect. However, the regression coefficient of  `aveage` represent the sum of within-table and between-table effects. Note that both predictors are centered at 40. 

```{r ranint3b}
out3m3 <- lmer(sat ~ I(age - 40) + I(aveage - 40) + (1|tableid), data=dat3, REML=FALSE)
summary(out3m3)
```

To calculate the between table effects, the regression coefficients are extracted from the summary output.

```{r ranint3c}
sumout3m3 <- summary(out3m3)
coef3m3 <- coef(sumout3m3)[,"Estimate"]
coef3m3
```

Calculate the between-table effect by summing the last two coefficients.

```{r ranint3d}
coef3m3[2] + coef3m3[3]
```

Next, visualize the within-table and between-table age effects. Because this data set has 500 tables. We will plot only 100 within-table regression lines. So `dat3_1` is created to extract only table ID that is divisible by 5.

```{r ranint3e}
dat3_1 <- dat3[dat3$tableid%%5 == 0,]
```

The `ggplot` function in the `ggplot2` package is used to create within-table regression lines. See that `aes` is used to make subsets of data for each table and `geom_smooth` is used to create regression lines within each subset of data. 

Next, to create the between-table regression line, `geom_abline` is used to overlap the current plot. Note that `btwintcept` is added by `btwslope*40` because the X-axis of the graph is the original-scaled age not the centered age at 40.

```{r ranint3f}
library(ggplot2)
out <- ggplot(dat3_1, aes(x=age, y=sat, group=tableid)) + geom_smooth(method=lm, se=FALSE)
btwslope <- coef3m3[2] + coef3m3[3]
btwintcept <- coef3m3[1]+(btwslope*40)
out + geom_abline(intercept=btwintcept, slope=btwslope, color="red")
```

Similar to customers' age, repeat the same process to `ave` to get the within- and between-table effects. 

```{r ranint3g}
dat3$avefemale <- ave(dat3$female, dat3$tableid)
out3m4 <- lmer(sat ~ female + I(avefemale - 0.5) + (1|tableid), data=dat3, REML=FALSE)
summary(out3m4)
```

Extract the regression coefficients. 

```{r ranint3h}
sumout3m4 <- summary(out3m4)
coef3m4 <- coef(sumout3m4)[,"Estimate"]
coef3m4
```

Visualize the within-table (blue) and between-table (red) effects of gender on customer satisfaction. 

```{r ranint3i}
out <- ggplot(dat3_1, aes(x=female, y=sat, group=tableid)) + geom_smooth(method=lm, se=FALSE)
btwslope <- coef3m4[2] + coef3m4[3]
btwintcept <- coef3m4[1]+(btwslope*0.5)
out + geom_abline(intercept=btwintcept, slope=btwslope, color="red")
```

Put all predictors in the model. 

```{r ranint3j}
out3m5 <- lmer(sat ~ female + I(age - 40) + I(avefemale - 0.5) + I(aveage - 40) + I(numperson - 4) + outdoor + (1|tableid), data=dat3, REML=FALSE)
summary(out3m5)
```