144 Problems
144.1 Multiple Linear Regression Models
144.1.1 Task 1
Based on the analysis presented in the Computation tab, examine the effect of the “Curry” variable on the “Rate” variable (= response). Compare the results with the ANOVA computation of Section 133.1.11.
The output shows the One Way ANOVA table of the SLRM. The F-Test is 27.37 which is identical to the one displayed in Section 133.1.11. From the corresponding p-value it can be concluded that there is a significant difference (of ratings) between ‘mild’ and ‘hot’ curry. The estimated coefficient for the coded curry variable is negative, which implies lower ratings for the level coded as 1 relative to the reference level coded as 0. The parameter of the SLRM is significantly different from zero because the p-value is sufficiently low.
144.1.2 Task 2
Examine the combined effect of the variables “Curry” and “Status” on the “Rate” variable. Compare the results in Section 133.1.12.
The first Figure shows the graphical illustration of the MLRM. Since the cases are grouped by treatment, it is possible to see the four mean levels of the treatments (i.e. ‘SMK’ and ‘hot’; ‘SMK’ and ‘mild’; ‘NS’ and ‘hot’; ‘NS’ and ‘mild’). The highest ratings are achieved in the group with non-smokers and ‘hot’ curry. If we combine the ‘hot’ groups (i.e. the first and third group) we obtain an average rating that is substantially higher than when the ‘mild’ groups are combined (second and fourth group). Hence, it is plausible to assume that ‘hot’ curry tastes better than the ‘mild’ variant. To compute such an effect, however, it is necessary to examine the parameter estimation results of the MLRM.
The coefficient results shows the MLRM parameter estimates which can be used to determine the effects of both treatments:
- The average rating of ‘smokers’ eating ‘mild’ curry can be computed by equating Currymild = 1, StatusSMK = 1, and Currymild*StatusSMK = 1. The result is 8.1 - 4.45 - 3.95 + 4.1 = 3.8
- The average rating of ‘smokers’ eating ‘hot’ curry can be computed by setting Currymild = 0, StatusSMK = 1, and Currymild*StatusSMK = 0. The result is 8.1 - 3.95 = 4.15
- etc… (the other combinations can be found in similar ways)
The MLRM is equivalent to the Two Way ANOVA computation. On the other hand, the interpretation of the regression results requires a bit of additional calculation.
144.1.3 Task 3
Fit a binary logistic regression model for the fraud example discussed in Chapter 136. Report (i) odds ratios, (ii) ROC-AUC, and (iii) one threshold that is suitable when false negatives are much more costly than false positives.
A complete solution should include:
- interpretation of each coefficient through odds ratios (not raw log-odds only),
- reported AUC with a brief discrimination-quality interpretation,
- explicit threshold choice justified by the error-cost asymmetry.
144.1.4 Task 4
Using a count outcome, compare Poisson, quasipoisson, and negative binomial models as discussed in Chapter 137. Diagnose overdispersion and explain which model is most defensible.
A strong answer should:
- compute a dispersion diagnostic (e.g., residual deviance / df),
- compare standard errors and significance across families,
- justify final model choice in terms of fit and valid inference.
144.1.5 Task 5
Choose either a Cox proportional hazards model (Chapter 139) or a conditional inference tree (Chapter 140) and explain one practical situation where it is preferable to linear regression.
Credit should be given when the answer:
- clearly states the data structure that invalidates plain linear regression (e.g., censoring, strong nonlinearity, recursive partitioning need),
- explains the key assumption of the selected model,
- gives a concise interpretation of model output in context.