---
title: "Intro to Multilevel Model: Lecture 1 Example Code"
author: "Sunthud Pornprasertmanit"
output: html_document
date: "2024-01-09"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Simulation Comparing between Simple Random Sampling and Multistage Sampling

See slide 18 for the simulation designs.

A. Set the population

```{r sim1a}
set.seed(123321)
nrep <- 10000
dat <- matrix(c(0:10, 0:10 + 1, 0:10 + 2, 0:10 + 3, 0:10 + 4), nrow=5, byrow=TRUE)
n_all <- length(dat)
n_group <- ncol(dat)
n_each <- nrow(dat)
```

B. Set the sampling design

```{r sim1b}
usedgroup <- 2
usedeach <- 2
usedall <- usedgroup * usedeach
```

C. Set the function to calculate population standard errors

```{r sim1c}
sdp <- function(x) sqrt(mean((x-mean(x))^2))
```

D. Set the matrices to save the means and estimated standard errors from each 
replication.

```{r sim1d}
resultmean <- matrix(NA, nrep, 2)
resultse <- matrix(NA, nrep, 2)
```

E. Draw two samples: Simple random sampling and multistage sampling. Then 
calcuate means and estimated standard errors from each sampling method.

```{r sim1e}
for(i in 1:nrep) {
  # Simple Random Sampling
  index <- sample(1:n_all, usedall)
  tempdat1 <- dat[index]
  resultmean[i, 1] <- mean(tempdat1)
  resultse[i, 1] <- sd(tempdat1)/sqrt(usedall)
  
  # Multistage Random Sampling
  index_group <- sample(1:n_group, usedgroup)
  tempdat2g1 <- dat[sample(1:n_each, usedeach), index_group[1]]
  tempdat2g2 <- dat[sample(1:n_each, usedeach), index_group[2]]
  tempdat2 <- c(tempdat2g1, tempdat2g2)
  resultmean[i, 2] <- mean(tempdat2)
  resultse[i, 2] <- sd(tempdat2)/sqrt(usedall)
}
```

F. Find the averaged estimates, actual standard errors of the estimates, and the
averaged estimated standard errors.

```{r sim1f}
apply(resultmean, 2, mean)
apply(resultmean, 2, sdp)
apply(resultse, 2, mean)
```

G. Plot the sampling distributions.

1) Sampling Distribution of Means from Simple Random Sampling

```{r sim1g1}
hist(resultmean[,1], xlim=c(0, 14), main="Samp Dist of M (Simple)", xlab="M")
abline(v=7, col="red")
```

2) Sampling Distribution of Means from Multistage Sampling

```{r sim1g2}
hist(resultmean[,2], xlim=c(0, 14), main="Samp Dist of M (Multistage)", xlab="M")
abline(v=7, col="red")
```

H. Calculate biases.

1) Absolute bias in the estimates

```{r sim1h1}
apply(resultmean, 2, mean) - 7 # Bias
```

2) Relative bias in the estimates

```{r sim1h2}
(apply(resultmean, 2, mean) - 7)/7 # Relative Bias
```

3) Absolute bias in the estimated standard erors

```{r sim1h3}
apply(resultse, 2, mean) - apply(resultmean, 2, sdp) # Bias in SE
```

4) Relative bias in the estimated standard erors

```{r sim1h4}
(apply(resultse, 2, mean) - apply(resultmean, 2, sdp))/apply(resultmean, 2, sdp) # Relative Bias in SE
```

## Simulation Comparing Population Mean Estimation by the Means of Population Group Means or the Means of Estimated Group Means

See slide 24 for the simulation designs.

A. Set the population

```{r sim2a}
set.seed(123321)
nrep <- 10000
dat <- matrix(c(0:10, 0:10 + 3, 0:10 + 4, 0:10 + 5, 0:10 + 8), nrow=5, byrow=TRUE)
n_all <- length(dat)
n_group <- ncol(dat)
n_each <- nrow(dat)
```

B. Set the sampling design

```{r sim2b}
usedgroup <- 2
usedeach <- 2
```

C. Set the function to calculate population standard errors

```{r sim2c}
sdp <- function(x) sqrt(mean((x-mean(x))^2))
```

D. Set the matrices to save the means and estimated standard errors from each 
replication.

```{r sim2d}
resultmean <- matrix(NA, nrep, 2)
resultse <- matrix(NA, nrep, 2)
```

E. Draw two samples: 

1) Population Group Means: Draw 2 population group means and find the average
2) Estimated Group Means: Draw two groups. In each group, draw two samples. Next, 
find the average of each group. Then, average the obtained two averages. 

```{r sim2e}
for(i in 1:nrep) {
  # Single step
  index <- sample(1:n_group, usedgroup)
  tempdat1 <- dat[3, index]
  resultmean[i, 1] <- mean(tempdat1)
  resultse[i, 1] <- sd(tempdat1)/sqrt(usedgroup)
  
  # Double Step
  tempdat2m1 <- mean(dat[sample(1:n_each, usedeach), index[1]])
  tempdat2m2 <- mean(dat[sample(1:n_each, usedeach), index[2]])
  tempdat2 <- c(tempdat2m1, tempdat2m2)
  resultmean[i, 2] <- mean(tempdat2)
  resultse[i, 2] <- sd(tempdat2)/sqrt(usedgroup)
}
```

F. Find the averaged estimates, actual standard errors of the estimates, and the
averaged estimated standard errors.

```{r sim2f}
apply(resultmean, 2, mean)
apply(resultmean, 2, sdp)
apply(resultse, 2, mean)
```

G. Plot the sampling distributions.

1) Sampling Distribution of the Means of the Population Group Means

```{r sim2g1}
hist(resultmean[,1], xlim=c(2, 16), main="Samp Dist of M (Single)", xlab="M", breaks=11)
abline(v=9, col="red")
```

2) Sampling Distribution of the Means of the Estimated Group Means

```{r sim2g2}
hist(resultmean[,2], xlim=c(2, 16), main="Samp Dist of M (Double)", xlab="M", breaks=11)
abline(v=9, col="red")
```

H. Calculate biases.

1) Absolute bias in the estimates

```{r sim2h1}
apply(resultmean, 2, mean) - 9 # Bias
```

2) Relative bias in the estimates

```{r sim2h2}
(apply(resultmean, 2, mean) - 9)/9 # Relative Bias
```

3) Absolute bias in the estimated standard erors

```{r sim2h3}
apply(resultse, 2, mean) - apply(resultmean, 2, sdp) # Bias in SE
```

4) Relative bias in the estimated standard erors

```{r sim2h4}
(apply(resultse, 2, mean) - apply(resultmean, 2, sdp))/apply(resultmean, 2, sdp) # Relative Bias in SE
```