--- title: "Intro to Multilevel Model: Lecture 1 Example Code" author: "Sunthud Pornprasertmanit" output: html_document date: "2024-01-09" --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Simulation Comparing between Simple Random Sampling and Multistage Sampling See slide 18 for the simulation designs. A. Set the population ```{r sim1a} set.seed(123321) nrep <- 10000 dat <- matrix(c(0:10, 0:10 + 1, 0:10 + 2, 0:10 + 3, 0:10 + 4), nrow=5, byrow=TRUE) n_all <- length(dat) n_group <- ncol(dat) n_each <- nrow(dat) ``` B. Set the sampling design ```{r sim1b} usedgroup <- 2 usedeach <- 2 usedall <- usedgroup * usedeach ``` C. Set the function to calculate population standard errors ```{r sim1c} sdp <- function(x) sqrt(mean((x-mean(x))^2)) ``` D. Set the matrices to save the means and estimated standard errors from each replication. ```{r sim1d} resultmean <- matrix(NA, nrep, 2) resultse <- matrix(NA, nrep, 2) ``` E. Draw two samples: Simple random sampling and multistage sampling. Then calcuate means and estimated standard errors from each sampling method. ```{r sim1e} for(i in 1:nrep) { # Simple Random Sampling index <- sample(1:n_all, usedall) tempdat1 <- dat[index] resultmean[i, 1] <- mean(tempdat1) resultse[i, 1] <- sd(tempdat1)/sqrt(usedall) # Multistage Random Sampling index_group <- sample(1:n_group, usedgroup) tempdat2g1 <- dat[sample(1:n_each, usedeach), index_group[1]] tempdat2g2 <- dat[sample(1:n_each, usedeach), index_group[2]] tempdat2 <- c(tempdat2g1, tempdat2g2) resultmean[i, 2] <- mean(tempdat2) resultse[i, 2] <- sd(tempdat2)/sqrt(usedall) } ``` F. Find the averaged estimates, actual standard errors of the estimates, and the averaged estimated standard errors. ```{r sim1f} apply(resultmean, 2, mean) apply(resultmean, 2, sdp) apply(resultse, 2, mean) ``` G. Plot the sampling distributions. 1) Sampling Distribution of Means from Simple Random Sampling ```{r sim1g1} hist(resultmean[,1], xlim=c(0, 14), main="Samp Dist of M (Simple)", xlab="M") abline(v=7, col="red") ``` 2) Sampling Distribution of Means from Multistage Sampling ```{r sim1g2} hist(resultmean[,2], xlim=c(0, 14), main="Samp Dist of M (Multistage)", xlab="M") abline(v=7, col="red") ``` H. Calculate biases. 1) Absolute bias in the estimates ```{r sim1h1} apply(resultmean, 2, mean) - 7 # Bias ``` 2) Relative bias in the estimates ```{r sim1h2} (apply(resultmean, 2, mean) - 7)/7 # Relative Bias ``` 3) Absolute bias in the estimated standard erors ```{r sim1h3} apply(resultse, 2, mean) - apply(resultmean, 2, sdp) # Bias in SE ``` 4) Relative bias in the estimated standard erors ```{r sim1h4} (apply(resultse, 2, mean) - apply(resultmean, 2, sdp))/apply(resultmean, 2, sdp) # Relative Bias in SE ``` ## Simulation Comparing Population Mean Estimation by the Means of Population Group Means or the Means of Estimated Group Means See slide 24 for the simulation designs. A. Set the population ```{r sim2a} set.seed(123321) nrep <- 10000 dat <- matrix(c(0:10, 0:10 + 3, 0:10 + 4, 0:10 + 5, 0:10 + 8), nrow=5, byrow=TRUE) n_all <- length(dat) n_group <- ncol(dat) n_each <- nrow(dat) ``` B. Set the sampling design ```{r sim2b} usedgroup <- 2 usedeach <- 2 ``` C. Set the function to calculate population standard errors ```{r sim2c} sdp <- function(x) sqrt(mean((x-mean(x))^2)) ``` D. Set the matrices to save the means and estimated standard errors from each replication. ```{r sim2d} resultmean <- matrix(NA, nrep, 2) resultse <- matrix(NA, nrep, 2) ``` E. Draw two samples: 1) Population Group Means: Draw 2 population group means and find the average 2) Estimated Group Means: Draw two groups. In each group, draw two samples. Next, find the average of each group. Then, average the obtained two averages. ```{r sim2e} for(i in 1:nrep) { # Single step index <- sample(1:n_group, usedgroup) tempdat1 <- dat[3, index] resultmean[i, 1] <- mean(tempdat1) resultse[i, 1] <- sd(tempdat1)/sqrt(usedgroup) # Double Step tempdat2m1 <- mean(dat[sample(1:n_each, usedeach), index[1]]) tempdat2m2 <- mean(dat[sample(1:n_each, usedeach), index[2]]) tempdat2 <- c(tempdat2m1, tempdat2m2) resultmean[i, 2] <- mean(tempdat2) resultse[i, 2] <- sd(tempdat2)/sqrt(usedgroup) } ``` F. Find the averaged estimates, actual standard errors of the estimates, and the averaged estimated standard errors. ```{r sim2f} apply(resultmean, 2, mean) apply(resultmean, 2, sdp) apply(resultse, 2, mean) ``` G. Plot the sampling distributions. 1) Sampling Distribution of the Means of the Population Group Means ```{r sim2g1} hist(resultmean[,1], xlim=c(2, 16), main="Samp Dist of M (Single)", xlab="M", breaks=11) abline(v=9, col="red") ``` 2) Sampling Distribution of the Means of the Estimated Group Means ```{r sim2g2} hist(resultmean[,2], xlim=c(2, 16), main="Samp Dist of M (Double)", xlab="M", breaks=11) abline(v=9, col="red") ``` H. Calculate biases. 1) Absolute bias in the estimates ```{r sim2h1} apply(resultmean, 2, mean) - 9 # Bias ``` 2) Relative bias in the estimates ```{r sim2h2} (apply(resultmean, 2, mean) - 9)/9 # Relative Bias ``` 3) Absolute bias in the estimated standard erors ```{r sim2h3} apply(resultse, 2, mean) - apply(resultmean, 2, sdp) # Bias in SE ``` 4) Relative bias in the estimated standard erors ```{r sim2h4} (apply(resultse, 2, mean) - apply(resultmean, 2, sdp))/apply(resultmean, 2, sdp) # Relative Bias in SE ```