#$$$$$$$$$$$$$$$$$$$$$ EXE 5.1 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$# ################################################# ######################### X2 ################### ################################################# # analyze a contingency table where the dependent # variable is destination and the independent variable # is the origin (of occupational status) # use function chisq.test ?occupationalStatus # get the expected frequencies from your solution above # look under 'value' on ?chisq.test # what assumption is violated here? how many times? # collapse columns so that the assumption is no longer # violated. How is the X2 changed? #$$$$$$$$$$$$$$$$$$$$$ EXE 5.2 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$# ########################################################################### ### to exercise with logit regression, you will work on the ### ### survey data file which you can find on my webpage in text format ### ### DO NOT OPEN DATASET FIRST IN EXCEL!!! THIS ADDS ROWNUMBERS ### ### ### ### READ IN SURVEY DATAFILE: ### ### read in data using "read.table" function & name the datafile ### ### - name variables are on top (header=TRUE) ### ### - separator is ; ### ### ### ### This dataset contains : ### ### ### ### brand attitude : att_brand1 to att_brand5 ### ### ad attitude : att_ad1 - att_ad5 ### ### purchase int. : pi1 - pi3 ### ### all on 7pt: 1=strongly disagree, 7=strongly agree ### ### warning : warning that an attention check is used ### ### IMC : check to see whether people read the question ### ### IMCcorrect : 0 = fail, 1 = correct ### ### gender : sex, 1=male, 2=female ### ### education : edu, 1 = low, 6 = high ### ### timeSeconds : time spent on survey in seconds ### ########################################################################### ################################################# ############### logit regression ############### ################################################# # explain the passing rate of the IMC by warning # 1 means they did receive a warning, 0 that they didnt. # calculate the odds ratio # compare the glm coefficient with the odds ratio # calculate effect of warning on IMCcorrect separately for men and women # and the interaction effect which is ratio of two odds ratio # tip: first create two tables, one for males and one for females # note that 1 is men, 2 is women # check with glm function including the interaction effect # between warning and sex # add interval variable 'age' or 'time' to the simple regression # equation of warning on IMCcorrect # test interaction effect of one variable you chose above (time/age) with warning on IMC # i.e. see whether the level of age/time influences the effect of warning # on successfully pass the IMC #$$$$$$$$$$$$$$$$$$$$$ EXE 5.3 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$# ################################################# ################### model fit ################# ################################################# # fit two models (one that has more independent variables # than the other, i.e. nested models) # test whether the larger model fits better than # the model with fewer parameters # use pchisq # check with anova # estimate mcfadden R # use the formula in the slides # make sure you compare the same cases across models as # both models need to have the same null deviance! # check whether null.deviance is the intercept only model # compare your results with the models estimated above # make sure that the intercept model has the same number of cases # as the model you estimated before #____#____#____#____#____# OPTIONAL #____#____#____#____#____# # for those who want to write functions: # calculate the log likelihood using the parameter values # these parameter values are more difficult to program than # with OLS: you need to use maximum likelihood (using optim) # 1. get dataset based on listwise deletion # 2. get the parameter values # 3. estimate the probabilities for each observation # TIP you can use the formula in the slides # 4. for each observation you need to take the log # probability you estimated above and for those observations # with an y-value of zero you take the log of p-1 # and for the others log of p # use if else function # 5. together these probabilities should sum to the log likelihood # TIP residual deviance = -2 * likelihood # AIC = -2 * likelihood + k * 2 # k = number of parameters (also intercept!)