#$$$$$$$$$$$$$$$$$$$$$   EXE 5.1  $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$#

#################################################
######################### X2  ###################
#################################################
# analyze a contingency table where the dependent 
# variable is destination and the independent variable
# is the origin (of occupational status)
# use function chisq.test
?occupationalStatus


# get the expected frequencies from your solution above
# look under 'value' on ?chisq.test
# what assumption is violated here? how many times?


# collapse columns so that the assumption is no longer
# violated. How is the X2 changed?



#$$$$$$$$$$$$$$$$$$$$$   EXE 5.2  $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$#

###########################################################################
### to exercise with logit regression, you will work on the             ###
### survey data file which you can find on my webpage in text format    ###
### DO NOT OPEN DATASET FIRST IN EXCEL!!! THIS ADDS ROWNUMBERS          ###
###                                                                     ###
### READ IN SURVEY DATAFILE:                                            ###
### read in data using "read.table" function & name the datafile        ###
###   - name variables are on top (header=TRUE)                         ###
###   - separator is ;                                                  ###
###                                                                     ###
### This dataset contains :                                             ###
###                                                                     ###
### brand attitude  : att_brand1 to att_brand5                          ###
### ad attitude     : att_ad1 - att_ad5                                 ###
### purchase int.   : pi1 - pi3                                         ###
###                   all on 7pt: 1=strongly disagree, 7=strongly agree ###
### warning         : warning that an attention check is used           ###
### IMC             : check to see whether people read the question     ###
### IMCcorrect      : 0 = fail, 1 = correct                             ###
### gender          : sex, 1=male, 2=female                             ###
### education       : edu, 1 = low, 6 = high                            ###
### timeSeconds     : time spent on survey in seconds                   ###
###########################################################################

#################################################
############### logit regression  ###############
#################################################
# explain the passing rate of the IMC by warning
# 1 means they did receive a warning, 0 that they didnt.
# calculate the odds ratio
# compare the glm coefficient with the odds ratio


# calculate effect of warning on IMCcorrect separately for men and women
# and the interaction effect which is ratio of two odds ratio
# tip: first create two tables, one for males and one for females
# note that 1 is men, 2 is women


# check with glm function including the interaction effect 
# between warning and sex



# add interval variable 'age' or 'time' to the simple regression
# equation of warning on IMCcorrect



# test interaction effect of one variable you chose above (time/age) with warning on IMC
# i.e. see whether the level of age/time influences the effect of warning 
# on successfully pass the IMC



#$$$$$$$$$$$$$$$$$$$$$   EXE 5.3  $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$#

#################################################
###################  model fit  #################
#################################################
# fit two models (one that has more independent variables
# than the other, i.e. nested models)



# test whether the larger model fits better than 
# the model with fewer parameters
# use pchisq
# check with anova


# estimate mcfadden R
# use the formula in the slides
# make sure you compare the same cases across models as
# both models need to have the same null deviance!



# check whether null.deviance is the intercept only model 
# compare your results with the models estimated above
# make sure that the intercept model has the same number of cases
# as the model you estimated before



#____#____#____#____#____# OPTIONAL #____#____#____#____#____#

# for those who want to write functions:
# calculate the log likelihood using the parameter values 
# these parameter values are more difficult to program than
# with OLS: you need to use maximum likelihood (using optim)

# 1. get dataset based on listwise deletion

# 2. get the parameter values

# 3. estimate the probabilities for each observation
# TIP you can use the formula in the slides

# 4. for each observation you need to take the log
# probability you estimated above and for those observations
# with an y-value of zero you take the log of p-1
# and for the others log of p
# use if else function

# 5. together these probabilities should sum to the log likelihood
# TIP residual deviance = -2 * likelihood
# AIC = -2 * likelihood + k * 2
# k = number of parameters (also intercept!)