Part A. [40 Marks]
Students are required to answer ALL questions in this section.
1. Discuss the differences and similarities between statistical estimation and statistical hypothesis testing. [5marks] 2. Open Dataset 1 and for each category of Data indicate the measurement level. [5marks] 3. The QMX Company has two assembly lines in its Rotterdam plant. Line A produces and average of 335units per day with a standard deviation equal to 11units. Line B produces and average of 145 units per day with a standard deviation equal to 8units. Based on this information using the coefficient of variation determine which line is relatively more consistent? Explain the answer by the information given and through the calculations involved in arriving at the answer. Make sure the calculations are shown on the answer sheet. [10marks] 4. Using the sample dataset below reflecting the electricity bills for ten households in Apeldoorn, The Netherlands in January,
Calculate the three measures of central tendency for this data sample. Based on the results and using a boxplot determine whether the sample data is skewed. Also identify if there are any outliers. [10marks]
5. The fares received by taxi drivers working in the City Taxi line are normally distributed with a mean of €12;50 and a Standard deviation of €3.25. Based on this information, what is the probability that a specific fare will exceed €15.00?
Part B [60 marks] NOTE: Question 6 is compulsory. Answer question 6 and one other question from this section 6. Dataset 3 (a, b &c) outlines the happiness index for most countries in the world over a three year period. It also shows the factors that are related to the happiness index. Using dataset 3 (a,b &c) (World happiness index, 2019, 2018, 2017), Perform the following tasks.
a. Using ANOVA determine whether the Global level of happiness changed over the three years in the dataset. Is the change statistically significant?
b. Using the dataset for 2019, determine if there is any relationship between the Happiness index score and GDP per capita, Social support, Healthy life expectancy, freedom to make life choices, Generosity and Perceptions of corruption. Make sure these relationships are put into a correlation matrix for better visual impact. Provide an explanation for the different relationships (if any), that these factors have with the happiness index.
c. Using the data from 2019, determine in order of rank which of the factors best predicts the level of happiness. Using the appropriate statistical tool provide a detailed explanation of the results of your analysis (why some factors are stronger predictors of happiness.
d. Show and explain all the tables generated by the software in relation to your analysis and answer. [Total 40marks] 7. One of the major oil producing companies conducted a study recently to estimate the mean liters of petrol purchased by customers per visit to a petrol station. To do this, a random sample of customers was selected with the following data being recorded that show the liters of petrol purchased. 8.7 22.4 9.5 13.3 18.9
Based on these sample data. Construct and interpret a 95 percent confidence interval estimate for the population mean. [20marks]
8. A drone builder states in their terms of conditions that the battery for their foldable drone will last for more than 4 hours of continuous flight-time if the battery was fully charged before the flight.
To test this, claim a sample of n = 10 drones were tested. The results showed a sample mean of 4.2 hours and a sample standard deviation of 0.4 hours. Conduct the hypothesis test using a 0.05 level of significance and determine whether or not the manufacturers claim is supported. [20marks] 9. Suppose we have 105 patients under study and 50 of them were treated with the drug. Moreover, the remaining 55 patients were kept under control samples. The health condition of all patients was checked after a week. We want to assess if their condition has improved or not: Did the drug have a positive effect on the patient? Use a chi square test of independence, construct a contingency table for expected frequencies and interpret your results. [20marks] The data is saved in Dataset 2 – treatment.csv