--- title: "Intro to Multilevel Model: Lecture 2 Example Code" author: "Sunthud Pornprasertmanit" output: html_document date: "`r format(Sys.time(), '%d/%m/%Y')`" --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Example 1: The Effects of Sources of Pressure on Stress Read the data set ```{r ex1a} ex1 <- read.table("lecture2ex1.csv", sep=",", header=TRUE) ``` Find the descriptive statistics of the variables in the data set ```{r ex1b} library(psych) describe(ex1) ``` Test whether each correlation in a correlation matrix is significantly different from 0. ```{r ex1c} corr.test(ex1) ``` Run multiple regression where `stress` is the outcome and `homepress` and `workpress` are predictors. Save the output of the multiple regression as `out1` and use `summary` to see the output. ```{r ex1d} out1 <- lm(stress ~ homepress + workpress, data=ex1) summary(out1) ``` Find the standardized regression coefficients and their significance tests ($\ne$ 0) by the `betaDelta` package ```{r ex1e} library(betaDelta) BetaDelta(out1) ``` Find the variance partitioning by the `anova` function. Be careful that the resulting sum of squares are in Type 1 (controlling variables in previous lines). ```{r ex1f} anova(out1) ``` ## Example 2: Factors influencing cuteness. Read the data set ```{r ex2a} ex2 <- read.table("lecture2ex2.csv", sep=",", header=TRUE) ``` Find the descriptive statistics of the variables in the data set ```{r ex2b} describe(ex2) ``` Test whether each correlation in a correlation matrix is significantly different from 0. ```{r ex2c} corr.test(ex2) ``` Run multiple regression, save the output, and run the summary function. ```{r ex2d} out2 <- lm(cute ~ face + shape + habit + close, data=ex2) summary(out2) ``` Find the standardized regression coefficients and their significance tests ($\ne$ 0) by the `betaDelta` package ```{r ex2e} BetaDelta(out2) ``` Find the variance partitioning by the `anova` function. Be careful that the resulting sum of squares are in Type 1 (controlling variables in previous lines). ```{r ex2f} anova(out2) ``` Centering predictors as follows: - `face` is centered at 50 - `shape` is centered at its mean - `habit` is centered at 50 - `close` is centered at 50 `I()` is used to run a mathematical operation on predictors before putting them in multiple regression. ```{r ex2g} out2a <- lm(cute ~ I(face - 50) + I(shape - mean(shape)) + I(habit - 50) + I(close - 50), data=ex2) summary(out2a) ``` The intercepts are changed but the slopes are not changes. The standardized coefficients and the anova table would not be changed too. ```{r ex2h} BetaDelta(out2a) anova(out2a) ``` ## Example 3: Factors influencing eating spicy. Read and view the data set. ```{r ex3a} ex3 <- read.table("lecture2ex3.csv", sep=",", header=TRUE) head(ex3, 15) ``` Change sex to a factor variable where the first group is the reference group, which is `female` here. When a variable is in the factor format, R will know that each number represents different categories (not viewed as low or high as in a standard variable). ```{r ex3adummya} ex3[,"sex"] <- factor(ex3[,"sex"], labels=c("female", "male")) ``` Change region to a factor variable where the first group is the reference group, which is `north` here. ```{r ex3adummyb} ex3[,"region"] <- factor(ex3[,"region"], labels=c("north", "center", "northeast", "south")) ``` Change the reference group to `south` by the `relevel` function and view the data set again. ```{r ex3adummyc} ex3[,"region"] <- relevel(ex3[,"region"], ref="south") head(ex3, 15) ``` Find the descriptive statistics of the variables in the data set ```{r ex3b} describe(ex2) ``` Find the frequency tables of both categorical variables. ```{r ex3b2} with(ex3, table(sex)) with(ex3, table(region)) ``` Test the correlation excluding categorical variables. ```{r ex3c} corr.test(ex3[,c("chilimg", "age")]) ``` Run multiple regression where `age` is centered at 40. `sex` and `region` will be transformed to dummy variables automatically because both variables are set as factors previously. ```{r ex3d} out3 <- lm(chilimg ~ I(age - 40) + sex + region, data=ex3) summary(out3) ``` Find the variance partitioning by the `anova` function. Be careful that the resulting sum of squares are in Type 1 (controlling variables in previous lines). ```{r ex3f} anova(out3) ``` To test each set of predictors, multiple regression results with and without the set of predictors can be compared by the `anova` function. First, check whether `region` is significant. ```{r ex3g} out3noregion <- lm(chilimg ~ I(age - 40) + sex, data=ex3) summary(out3noregion) anova(out3noregion, out3) ``` The anova function is necessary to test `region` because `region` consists of many dummy variables. `sex` and `age` can be compared by `anova` too but the *p* value is the same as t-test in the `summary` function of the multiple regression with all predictors. ```{r ex3g2} out3nosex <- lm(chilimg ~ I(age - 40) + region, data=ex3) anova(out3nosex, out3) ``` ```{r ex3g3} out3noage <- lm(chilimg ~ sex + region, data=ex3) anova(out3noage, out3) ``` Even though it is possible to find standardized regression coefficients of dummy variables. To compare whether age, sex, or region has the most predictive power on the dependent variable, standardized regression coefficients cannot be used directly because region has three dummy variables whereas other variables have only one predictor. ```{r ex3e} BetaDelta(out3) ``` To compare all predictors, $\Delta R^2$ provides how much each variable explain unique variances of the dependent variable. `summary(lmoutput)$r.squared` is used to extract $R^2$. That is, the summary method provides an output object from the `lm` function, which its type is list. One element of the output list is called `r.squared`. You can extract by using \$ to get the element. ```{r ex3e2} summary(out3)$r.squared - summary(out3noage)$r.squared summary(out3)$r.squared - summary(out3nosex)$r.squared summary(out3)$r.squared - summary(out3noregion)$r.squared ```