Refer to the datasets indicated on the homework assignment page on Canvas.
1. A bacteriologist is studying the antibiotic properties of four solutions (labeled T1 through
T4 in the dataset antibiotics on Canvas). The bacteriologist measures the reduction
in bacterial cultures when each solution is applied and would like to know which solutions are more effective than others as antibiotics.
a. Examine the data by making a side-by-side boxplot for the reduction in bacterial
cultures for each treatment group (refer to the Homework 2 Guide for a refresher
on making box plots).
b. What are the null and alternative models for an ANOVA in this context?
c. Evaluate the assumptions for ANOVA (refer to the Lecture 11a Prereading and
the Homework 2 Guide for reminders about the relevant R code).
d. What is the value of the test statistic for an ANOVA of the antibiotic data? What
does this test statistic represent, in the context of the scenario? Name two conditions that, if changed, would result in a smaller test statistic.
e. Explain what conclusion the bacteriologist should draw, based on the data.
f. Find Tukey-adjusted 95 percent confidence intervals for the difference in each
pair of solutions. Which solutions are significantly different from the others?
g. Would the unadjusted confidence intervals be wider or narrower? Explain.
2. An epidemiologist studying factors associated with glucose levels in the United States
would like to know whether glucose levels differ by race/ethnicity. Because glucose levels
are highly skewed, the researcher decides to investigate the median, rather than the mean,
for each group. The researcher plans to conduct a Kruskal-Wallis test with a Bonferonni
adjustment for post-hoc comparisons to keep the family-wise Type I error rate to no
more than .05. Refer to the glucose dataset on Canvas.
a. What is the sample median for each race/ethnicity group?
b. What are the null and alternative models for this test?
c. Find the p-value for the Kruskal-Wallis test. Explain what conclusion you will
draw from the statistical test.
d. Explain what significance level will you use for each post-hoc comparison to keep
the family-wise Type I error rate to no more than .05?
e. The p-values for each Wilcoxon pairwise comparison are listed below. Which
groups are significantly different from the others?
f. How would your answer to (d) differ if you were not adjusting your conclusions
to account for multiple comparisons? Why is it important to consider the consequences of conducting multiple tests in a research study?
Comparison Wilcoxon p-value
Black vs Hispanic .3260
Black vs Other .9157
Black vs White .0195
Hispanic vs Other .3737
Hispanic vs White .0009
Other vs White .0149
3. Apublic health worker wants to use air quality (as measured by the number of particulates
in the air in parts per million) to explain variation in the childhood asthma rate for various cities. Refer to the dataset asthma on Canvas.
a. Estimate a linear model for this analysis. What is the estimated linear equation
for the model? Explain the interpretation of the slope.
b. Create scatterplots (see p. 3) for (i) asthma rate vs. air quality and (ii) the residuals
of the linear model vs. air quality. Evaluate the assumptions of the linear model.
c. The public health worker wants to know whether there is strong evidence of a
relationship between air quality and childhood asthma. What are the null and
alternative models for the statistical test that can address this research question?
d. Explain what conclusion the public health worker should draw, based on your
e. What is the prediction of childhood asthma for a city that has a particulate air
quality of 10 ppm? Show how this is calculated.
f. Find a 95 percent confidence interval for the mean childhood asthma rate in cities that have particulate air quality of 10 ppm.
g. The public health worker is visiting a city with a particulate air quality of 10 ppm.
What is a 95 percent interval for the prediction of that city’s childhood asthma
rate? Explain why this interval is different from the interval in part(f).
h. If the public health worker were visiting a city with a particulate air quality of
15 ppm, would the prediction interval be narrower or wider? Explain.