Logit regression differs from linear regression in the dependent variable which is not normally distributed but follows a binomial distribution. In nominal logit regression, the dependent variable has two values, 0 and 1. In multinominal logit regression, the dependent variable has multiple values which do not have any meaning (i.e. 2 is not twice as much as 1). In adjacent logit regression, the multiple values do follow an ordering but not in quantity (i.e. 4 is more than 2, but 4 is not twice the value of 2). In all logit models, the dependent variable and hence, the error is modeled using a binomial distribution.
The model is mathematically described as the linear model but with a different link function (by which the independent variable is regressed on the dependent variable). We now write the dependent variable in terms of expectation (i.e. the mean predicted value).
\[\begin{aligned} g(E(y)) &= \beta_0 + \beta_1 * x_1 + \epsilon\ &where\ \epsilon \sim N(0,1) \end{aligned}\]where g() is the sigmoid link function. The probability of obtaining a certain value of p depends on the intercept and the beta coefficient. The probability is calculated using the exponent of the equation. Applying the link function leads to the following ratio.
\[p(y) = \frac{exp^{\beta_0 + \beta_1{X_i}}}{1+exp^{\beta_0 + \beta_1{X_i}}}\] This ratio transforms the probability to a sigmoid curve with the S-shape. This means that the outcome value can never be larger than 1, or smaller than 0. This outcome value is a probability and should always be between 0 and 1.
To predict whether respondents give socially desirable answers, we use data on the big five personality traits. The social desirability scale consists of 10 questions about socially unacceptable behaviors (the datafile can be downloaded). When people respond with TRUE (coded as 1), they are more likely to depict themselves in a favorable light. Some questions are reversed formulated (indicated by - in the survey).
We relate the five psychological traits (also called the big 5) to socially desirable responding. The psychological traits are measured by 10 questions each (1 = does not apply to me at all, and 5 = very accurate). To include the five traits as independent variables, we calculate means (factor analysis would be a more sophisticated way but for now means are fine). To calculate the means, we need to recode the reversed items so that a 5 means a high score on the trait for each question (and not the reverse).
recodeVars <- function(varnames, data){
for (i in 1:length(varnames)){
varRecoded <- paste(varnames[i],"r", sep="") # create name recoded var
data <- cbind(data,NA) # create column
colnames(data)[ncol(data)]<- varRecoded # add var name
var <- data[, varnames[i]]
var[var==1 & !is.na(var)]<-5
var[var==2 & !is.na(var)]<-4
var[var==4 & !is.na(var)]<-2
var[var==5 & !is.na(var)]<-1
data[, varRecoded] <- var
}
return(data)
}
bigfive <- recodeVars(c("agree1","agree3","agree5","agree7","extra2","extra4","extra6","extra8",
"extra10","consc2","consc4","consc6","consc8","neuro2","neuro4","open2",
"open4","open6"),data=bigfive)
Then we need to calculate one score for each trait so we can include this score in the regression model as an independent variable. A factor score would be a better solution but for now we suffice with means.
bigfive$agree <-rowMeans(cbind(bigfive$agree1r,bigfive$agree2,bigfive$agree3r,bigfive$agree4,
bigfive$agree5r,bigfive$agree6,bigfive$agree7r,bigfive$agree8,bigfive$agree9,
bigfive$agree10))
bigfive$extra <-rowMeans(cbind(bigfive$extra1,bigfive$extra2r,bigfive$extra3,bigfive$extra4r,
bigfive$extra5,bigfive$extra6,bigfive$extra7,bigfive$extra8r,bigfive$extra9,
bigfive$extra10r))
bigfive$consc <-rowMeans(cbind(bigfive$consc1,bigfive$consc2r,bigfive$consc3,bigfive$consc4r,
bigfive$consc5,bigfive$consc6r,bigfive$consc7,bigfive$consc8r,bigfive$consc9,
bigfive$consc10))
bigfive$open <- rowMeans(cbind(bigfive$open1,bigfive$open2r,bigfive$open3,bigfive$open4r,
bigfive$open5,bigfive$open6r,bigfive$open7,bigfive$open8,bigfive$open9,
bigfive$open10))
bigfive$neuro <-rowMeans(cbind(bigfive$neuro1,bigfive$neuro2r,bigfive$neuro3,bigfive$neuro4r,
bigfive$neuro5,bigfive$neuro6,bigfive$neuro7,bigfive$neuro8,bigfive$neuro9,
bigfive$neuro10))
Note that the functions sum() or mean() can be used on one column but not multiple columns. We want the mean to be calculated across all columns for each row separately. The functions either calculate the mean across one column or one row. Note that the function rowMeans() uses pairwise deletion instead of listwise deletion. If you want listwise deletion, use na.rm=TRUE or add the scores of all columns and divide by the total yourself:
(bigfive$agree1r+bigfive$agree2+bigfive$agree3r+bigfive$agree4+bigfive$agree5r+
bigfive$agree6+bigfive$agree7r+bigfive$agree8+bigfive$agree9+bigfive$agree10)/10
Do the people who respond in a socially desirable manner have a different personality than those who do not? We run a logit regression to explain the answer to the first social desirability question by five personality traits. First, we explore the correlations among the traits to inspect possible multi-collinearity.
cor(bigfive[,c(ncol(bigfive)-4):ncol(bigfive)],use="complete.obs") # check multi-collinearity
## agree extra consc open neuro
## agree 1.0000000 0.20193336 0.18279041 0.28049266 0.15412176
## extra 0.2019334 1.00000000 -0.25690538 0.31800232 0.08513464
## consc 0.1827904 -0.25690538 1.00000000 -0.03705669 -0.08084514
## open 0.2804927 0.31800232 -0.03705669 1.00000000 0.27018321
## neuro 0.1541218 0.08513464 -0.08084514 0.27018321 1.00000000
summary(glm(socdes1~1, data=bigfive, family=binomial)) # intercept only model
##
## Call:
## glm(formula = socdes1 ~ 1, family = binomial, data = bigfive)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7155 0.7223 0.7223 0.7223 0.7223
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.21062 0.07595 15.94 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1056 on 979 degrees of freedom
## Residual deviance: 1056 on 979 degrees of freedom
## (20 observations deleted due to missingness)
## AIC: 1058
##
## Number of Fisher Scoring iterations: 4
summary(glm(socdes1~agree + neuro + open + extra + consc, data=bigfive, family=binomial))
##
## Call:
## glm(formula = socdes1 ~ agree + neuro + open + extra + consc,
## family = binomial, data = bigfive)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3255 0.4321 0.6140 0.7430 1.5502
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.7839 1.1671 -2.385 0.01706 *
## agree 1.3628 0.3301 4.128 3.66e-05 ***
## neuro 0.6643 0.2130 3.119 0.00181 **
## open -0.8449 0.2697 -3.132 0.00173 **
## extra 0.5121 0.1922 2.664 0.00772 **
## consc -0.4424 0.1507 -2.935 0.00334 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1056.0 on 979 degrees of freedom
## Residual deviance: 1003.6 on 974 degrees of freedom
## (20 observations deleted due to missingness)
## AIC: 1015.6
##
## Number of Fisher Scoring iterations: 4
Apparently, the traits are weakly related so we can simultaneously add them in a regression model. The coefficients show that all traits relate significantly to the answer. Both agreeableness, extraversion and neuroticism are positively related to a socially desirable answer (coded as 1). Conscientiousness and openness relate negatively to social desirability. The AIC can be used to test the model against the null model (intercept ony) using a likelihood ratio test (the difference in AIC follows a chisquare distribution with degrees of freedom being the difference in number of parameters between the models).
fit0<-glm(socdes1~1, data=bigfive, family=binomial) # intercept only model
fit1<-glm(socdes1~agree + neuro + open + extra + consc, data=bigfive, family=binomial)
anova(fit0,fit1, test="LRT")
## Analysis of Deviance Table
##
## Model 1: socdes1 ~ 1
## Model 2: socdes1 ~ agree + neuro + open + extra + consc
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 979 1056.0
## 2 974 1003.6 5 52.42 4.424e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As you can read from the output, degrees of freedom is 5 (the five independent variables). The difference is 52.4 which is significantly different from 0. In this case, it is not so exciting as all the parameters were significant, but in cases with barely significant parameters a LRT test can be useful.
The intercept is the chance that people in the complete sample respond in a socially desirable way to the first question (1.2106173). The logit coefficient is also referred to as the log odds and is directly related to the regression equation. \[log\frac{p(y)}{1-p(y)} = {\beta_0 + \beta_1{X_i}}\]
To interpret this coefficient we need to transform it to the odds by taking the exponent:
exp(fit0$coeff)
## (Intercept)
## 3.355556
which is exactly the ratio between the people who responded socially desirable to those who did not, also referred to as \(p(y)\).
table(bigfive$socdes1)
##
## 0 1
## 225 755
table(bigfive$socdes1)[2] / table(bigfive$socdes1)[1]
## 1
## 3.355556
Agreeableness relates strongest to the socially desirability (logit coefficient = 1.363, odds = 3.907). A one-point increase in agreeableness leads to an almost 4 times higher chance to answer socially desirable (i.e. 391% increase). As agreeableness is a interval variable that ranges from 1 to 5, an increase on this variable by 1 means a reasonable higher tendency to agreeableness. The smallest effect is caused by conscientiousness (logit coefficient = -0.442, odds = 0.643). As the odds is smaller than 1 (meaning a negative effect), it can no longer be interpreted in terms of percentages. A one point increase in conscientiousness yields a decrease in the log odds of -0.442. For every person who scores one-point higher in conscientiousness, there is 0.643 person who does NOT respond socially desirable. All these effects should be interpreted holding the other variables constant. This is also the reason that the intercept is no longer meaningful, i.e. the probability to repond socially desirable when all the variables are zero.
Interaction effects can be added to the regression equation using an asterisk * to multiply two variables. R automatically inserts the main effects (i.e. the effects of trust and agree separately) in the equation which is necessary to interpret the interaction effect.
summary(glm(socdes1~agree*trust, data=bigfive, family=binomial))
##
## Call:
## glm(formula = socdes1 ~ agree * trust, family = binomial, data = bigfive)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1111 0.4774 0.6484 0.7472 1.3804
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.1819 2.2139 0.986 0.3244
## agree -0.4636 0.7528 -0.616 0.5380
## trust -0.7886 0.3635 -2.170 0.0300 *
## agree:trust 0.2944 0.1240 2.374 0.0176 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1016.70 on 960 degrees of freedom
## Residual deviance: 991.21 on 957 degrees of freedom
## (39 observations deleted due to missingness)
## AIC: 999.21
##
## Number of Fisher Scoring iterations: 4
An interaction effect shows whether two variables combined exert a stronger influence on the dependent variable than by themselves. In the equation above, people who score high on agreeableness AND trust are (odds = 1.3 times (or 130%) more likely to respond socially desirable than people who score one point lower on both dimensions. The main effects can only be interpreted when the other variables are set to exactly zero. This explains the negative effect of trust: a one point increase in trust when agreeableness is zero does not make any sense: agreeableness runs from 1 to 5. For the difference between holding constant (controlling for another variable) and interpreting interaction effects, see multiple regression.
##
## Call:
## glm(formula = socdes1 ~ agree + trust, family = binomial, data = bigfive)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0126 0.5320 0.6511 0.7266 1.3078
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.59297 0.90337 -2.870 0.004101 **
## agree 1.17597 0.30370 3.872 0.000108 ***
## trust 0.06833 0.03346 2.042 0.041144 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1016.70 on 960 degrees of freedom
## Residual deviance: 996.92 on 958 degrees of freedom
## (39 observations deleted due to missingness)
## AIC: 1002.9
##
## Number of Fisher Scoring iterations: 4
This nonsensical negative effect of trust is further clarified when the interaction is excluded from the model. Now, both trust en agreeableness relate positively to socially desirable responding. A one-point increase in agreeableness yields a 320% increase in the chance to answer socially desirable. For trust this is only a 110% increase. This effect might seem smaller, but the scale of trust is twice as large as the one for agreeableness. This means that an one-point increase on trust is a smaller step than a one-point increase in agreeableness. Both effects are controlled for the other variable, meaning that this effect of trust holds for all values of agreeableness.