---
title: "Intro to Multilevel Model: Lecture 3 Example Code"
author: "Sunthud Pornprasertmanit"
output: html_document
date: "`r format(Sys.time(), '%d/%m/%Y')`"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Example 1: The Effect of Transformational Leadership toward Job Satisfaction

Read the data set

```{r ex1a}
ex1 <- read.table("lecture3ex1.csv", sep=",", header=TRUE)
```

Find the descriptive statistics of the variables in the data set

```{r ex1b}
library(psych)
describe(ex1)
```

Test whether each correlation in a correlation matrix is significantly different from 0.

```{r ex1c}
corr.test(ex1)
```

Run multiple regression where `jobsat` is the outcome and `tlleader` and `security` are predictors. Save the output of the multiple regression as `out1` and use `summary` to see the output.

```{r ex1d}
out1 <- lm(jobsat ~ tlleader + security, data=ex1)
summary(out1)
```

`tlleader*security` represents the effect of each predictor alone and the product of both variables. That is, in R, the interaction can be represented by `*` in the `lm` formula.

```{r ex1d2}
out1a <- lm(jobsat ~ tlleader*security, data=ex1)
summary(out1a)
```

You will see that `tlleader:security` is the interaction effect, which is significant here.

The `anova` function compares the models and without the interaction.

```{r ex1d3}
anova(out1, out1a)
```

Because the intercept is not meaningful because `tlleader` = 0 and `security` = 0 are not meaningful. Thus, centering to their means are used. 

```{r ex1e1}
out1_1 <- lm(jobsat ~ I(tlleader - mean(tlleader)) + I(security - mean(security)), data=ex1)
summary(out1_1)
out1a_1 <- lm(jobsat ~ I(tlleader - mean(tlleader))*I(security - mean(security)), data=ex1)
summary(out1a_1)
```

You can see that the `anova` results between centered and uncentered outputs are the same. 

```{r ex1e2}
anova(out1_1, out1a_1)
```

To get the standardized estimates, the `betaDelta` package cannot be used here. Each predictor must be standardized (by the `scale` function) first before calculating their products and running regression. The `I` and `scale` functions are used to get standardized coefficients. Note that the test statistics are incorrect. Please use the point standardized estimates only. 

```{r ex1f}
out1_2 <- lm(I(scale(jobsat)) ~ I(scale(tlleader)) + I(scale(security)), data=ex1)
summary(out1_2)
out1a_2 <- lm(I(scale(jobsat)) ~ I(scale(tlleader))*I(scale(security)), data=ex1)
summary(out1a_2)
```

The `interactions` package is used for probing interactions. First, use `interact_plot` to visualize the interaction effect.

```{r ex1probe1}
library(interactions)
interact_plot(out1a, pred="tlleader", modx="security")
```

The lines representing each value of moderators can be modified by changing `modx.values`. `plot.points=TRUE` is used to provide data points. 

```{r ex1probe2}
interact_plot(out1a, pred="tlleader", modx="security", modx.values = c(30, 40, 50, 60, 70), plot.points=TRUE)
```

You can change many arguments. Please check the help page. 

```{r ex1probe3}
interact_plot(out1a, pred="tlleader", modx="security", interval=TRUE, int.width=.9,
   x.label = "Transformational Leadership", y.label = "Job Satisfaction",
   main.title = NULL,  legend.main = "Job Security",
   colors = "seagreen") 
```

The simple slopes can be tested whether the effect of the main predictor is significant at each level of moderators. 

```{r ex1probe4}
sim_slopes(out1a, pred="tlleader", modx="security")
```

Get and test the conditional intercepts.

```{r ex1probe5}
sim_slopes(out1a, pred="tlleader", modx="security", cond.int=TRUE)
```

Plot the confidence intervals of simple slopes at each level of the moderator.

```{r ex1probe6}
ss <- sim_slopes(out1a, pred="tlleader", modx="security", modx.values = seq(30, 75, 5))
plot(ss)
```

Check the Johnson-Neyman plot.

```{r ex1probe7}
johnson_neyman(out1a, pred="tlleader", modx="security", alpha = .05)
```

## Example 2: Factors influencing attitude toward classical musics.

Read the data set

```{r ex2a}
ex2 <- read.table("lecture3ex2.csv", sep=",", header=TRUE)
```

Find the descriptive statistics of the variables in the data set

```{r ex2b}
describe(ex2)
```

Test whether each correlation in a correlation matrix is significantly different from 0.

```{r ex2c}
corr.test(ex2)
```

Run the multiple regression without interactions.

```{r ex2d}
out2 <- lm(partlike ~ age + ses + guarlike, data=ex2)
summary(out2)
```

Run the multiple regression with the interaction between `age` and `ses`.

```{r ex2d2}
out2a <- lm(partlike ~ age*ses + guarlike, data=ex2)
summary(out2a)
```

The `anova` function compares the models and without the interaction.

```{r ex2d3}
anova(out2, out2a)
```

Check the simple slopes of `ses` at each level of `age`. 

```{r ex2e1}
sim_slopes(out2a, pred="ses", modx="age")
```

Plot the interaction between `ses` and `age`. 

```{r ex2e2}
interact_plot(out2a, pred="ses", modx="age")
```

## Example 3: Factors influencing angry emotions toward a person being late.

Read the data set

```{r ex3a}
ex3 <- read.table("lecture3ex3.csv", sep=",", header=TRUE)
```

Find the descriptive statistics of the variables in the data set

```{r ex3b}
describe(ex3)
```

Find the frequency table of `country`.

```{r ex3b2}
table(ex3[,"country"])
```

Test whether each correlation in a correlation matrix (only between continuous variables) is significantly different from 0.

```{r ex3c}
corr.test(ex3[,c("latemins", "angry")])
```

Change `country` to be in the factor format and change the reference group to `thai`.

```{r ex3c2}
ex3[,"country"] <- factor(ex3[,"country"])
ex3[,"country"] <- relevel(ex3[,"country"], ref="thai")
```

Run the multiple regression without interactions.

```{r ex3d}
out3 <- lm(angry ~ latemins + country, data=ex3)
summary(out3)
```

Run the multiple regression with the interaction between `latemins` and `country`. You will see that the dummy variables are multipled by the continuous variable automatically.

```{r ex3d2}
out3a <- lm(angry ~ latemins*country, data=ex3)
summary(out3a)
```

The `anova` function compares the models and without the interaction.

```{r ex3d3}
anova(out3, out3a)
```

Check the simple slopes of `latemins` at each `country`. 

```{r ex3e1}
sim_slopes(out3a, pred="latemins", modx="country")
```

Plot the interaction between `latemins` and `country`. 

```{r ex3e2}
interact_plot(out3a, pred="latemins", modx="country")
```

The `interactions` package can test the simple slopes of continuous variables at each level of categorical variable. It cannot examine whether the differences between countries are significant at each amount of late minutes. You need to adjust the formula in `lm` to test them.

First, at `latemins` = 0, we check whether the countries are different. One model has the country in the `lm` formula to examine their differences and another model exclude the `country` out of the formula to say that the differences between countries are zero. Then use `anova` to test the differences.

Note that `latemins:country` is the interaction effect alone whereas `latemins*country` include the first order terms and their interaction; that is, `latemins`, `country`, and `latemins:country`

```{r ex3f1}
out3a <- lm(angry ~ latemins*country, data=ex3)
out3ax <- lm(angry ~ latemins + latemins:country, data=ex3)
anova(out3ax, out3a)
```

Next, test the differences between countries at three-minute late. Use centering to test the differences. 

```{r ex3f2}
out3a3 <- lm(angry ~ I(latemins - 3)*country, data=ex3)
out3a3x <- lm(angry ~ I(latemins - 3) + I(latemins - 3):country, data=ex3)
anova(out3a3x, out3a3)
```

Next, test the differences between countries at six-minute late. Use centering to test the differences. 

```{r ex3f3}
out3a6 <- lm(angry ~ I(latemins - 6)*country, data=ex3)
out3a6x <- lm(angry ~ I(latemins - 6) + I(latemins - 6):country, data=ex3)
anova(out3a6x, out3a6)
```

Next, test the differences between countries at nine-minute late. Use centering to test the differences. 

```{r ex3f4}
out3a9 <- lm(angry ~ I(latemins - 9)*country, data=ex3)
out3a9x <- lm(angry ~ I(latemins - 9) + I(latemins - 9):country, data=ex3)
anova(out3a9x, out3a9)
```

Next, test the differences between countries at twelve-minute late. Use centering to test the differences. 

```{r ex3f5}
out3a12 <- lm(angry ~ I(latemins - 12)*country, data=ex3)
out3a12x <- lm(angry ~ I(latemins - 12) + I(latemins - 12):country, data=ex3)
anova(out3a12x, out3a12)
```

## Example 4: The curvilinear relationship between the frequecies of fear-based advertisement and smoking frequencies.

Read the data set

```{r ex4a}
ex4 <- read.table("lecture3ex4.csv", sep=",", header=TRUE)
```

Find the descriptive statistics of the variables in the data set

```{r ex4b}
describe(ex4)
```

Test whether each correlation in a correlation matrix (only between continuous variables) is significantly different from 0.

```{r ex4c}
corr.test(ex4)
```

Run the multiple regression without the second-order terms.

```{r ex4d}
out4 <- lm(cigarette ~ fearprob, data=ex4)
summary(out4)
```

Run the multiple regression with the second-order term. 

```{r ex4d2}
out4a <- lm(cigarette ~ fearprob + I(fearprob^2), data=ex4)
summary(out4a)
```

The `anova` function compares the models and without the second-order term.

```{r ex4d3}
anova(out4, out4a)
```

Plot the curvilinear relationship. 

```{r ex4e}
with(ex4, plot(fearprob, cigarette, xlab="Probability of the Fear Advertisement", ylab="Number of Cigarettes Smoked"))
myorder <- order(ex4[,"fearprob"])
yhat <- predict(out4a)
lines(ex4[myorder,"fearprob"], yhat[myorder])
```

Probe the instantaneous rate of change at the probability of fear advertisement of 20%.

```{r ex4f1}
out4a2 <- lm(cigarette ~ I(fearprob - 0.2) + I((fearprob - 0.2)^2), data=ex4)
summary(out4a2)
```

Probe the instantaneous rate of change at the probability of fear advertisement of 70%.

```{r ex4f2}
out4a7 <- lm(cigarette ~ I(fearprob - 0.7) + I((fearprob - 0.7)^2), data=ex4)
summary(out4a7)
```

## Example 3 (cont.): Factors influencing angry emotions toward a person being late.

Examine the assumptions by applying the `plot` function to the output.

```{r ex5a}
plot(out3)
```

Using the `car` package to find useful functions for checking assumptions. First, check the outliers.

```{r ex5b}
library(car)
outlierTest(out3)
```

The q-q plot can check the normality of residuals and outliers.

```{r ex5c}
qqPlot(out3)
```

The leverage plot can check influential cases.

```{r ex5d}
leveragePlots(out3)
```