For this exercise, we will use the variables igf1, age and tanner belonging to the dataset juul from the package ISwR.
First conduct simple regression and then multiple regression, and inspect the changes in parameters:
## (Intercept) age
## 360.860182 -1.191814
## (Intercept) tanner
## 165.23424 64.95695
## (Intercept) age tanner
## 194.118206 -4.832264 76.534903
Make a vector of the dependent variable, make a matrix of the two independent variables and include a vector of 1’s as the first column. Apply the formula in the slides. Use solve
function to calculate the inverse, use t
function to calculate the transpose, and remember that lm uses listwise deletion!
## [,1]
## [1,] 194.118206
## [2,] -4.832264
## [3,] 76.534903
Check normality of age and tanner, check outliers of the igf1 for each level of tanner (see 4.4.2 Dalgaard). Show all plots in one window.
Calculate the mean centered variables and include them in the dataset simultaneously. Compare the models with and without mean centering.
## (Intercept) age tanner
## 194.118206 -4.832264 76.534903
## (Intercept) Mage Mtanner
## 358.628788 -4.832264 76.534903
Check 10.6 Dalgaard how to do this. Compare the models with and without interaction effect. Look at the coefficients, the model fit (i.e. R2), and the standard errors). I used the uncentered variables below.
## (Intercept) age tanner
## 194.118206 -4.832264 76.534903
## (Intercept) age tanner age:tanner
## -106.694684 27.317193 162.418264 -8.023521
Compare the coefficients calculated under 4.1 in terms of strength. The coefficients need to be expressed in terms of their standard deviations. Calculate these standardized coefficients, and add them to the dataset.
## (Intercept) Zage Ztanner Zage:Ztanner
## 0.3655958 0.1198682 0.6005949 -0.5088640
Check with scale
function. Note that the mean is zero, and the standard deviation is 1. First conduct a standardization with ‘scale’ function. Second, check the range of values of both your manually standardized variable and the result obtained by scale
function using summary
.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.9070 -0.8135 -0.0181 0.0000 0.7152 3.2190
## V1
## Min. :-1.9069
## 1st Qu.:-0.8135
## Median :-0.0181
## Mean : 0.0000
## 3rd Qu.: 0.7152
## Max. : 3.2186
Note that the scale
function allows you to either only standardize (i.e. divide by standard deviation) or center (i.e. subtract the mean from all observations). For a more detailed discussion, see summary.
Estimate a simple regression (i.e. including only one independent variable) and compare with correlation coefficient. Use function cor
. The second solution is obtained using cor
.
## (Intercept) Zage
## 5.005913e-16 4.089242e-01
## [1] 0.4089242