Question 1 (5 marks) Note: Each student will get different answers as the data sets differ.

Use the assignment data set assigned to you: Variables to analyse: ‘sex’

a. Calculate the point estimate and 95% confidence interval for the proportion of females in the population NSW 17-year-olds using the random sample of NSW 17-year-olds assigned to you. (2 marks)

b. Carefully write in words, what the confidence interval in part a. is telling us. (2 marks).

c. Are the results in part a. consistent with the statement: “50% of 17-year-olds in NSW are female”? Explain why or why not. (1 mark)

Question 2 (7 marks) Note: Each student will get different answers as the data sets differ.

Research Question: Is average self-reported hours of moderate to vigorous physical activity (MVPA) per week equal between males and females in the population of NSW 17-year-olds?

Use the assignment data set assigned to you: Variables to analyse: ‘MVPA’ and ‘sex’

a. Use appropriate charts and/or statistics to describe the shape of the distribution of self-reported hours of MVPA per week for 17-year-old males and females in the sample. (2 marks)

b. Use an appropriate non-parametric test and R Commander to test the hypothesis that the average self-reported hours of MVPA per week is equal between males and females in the population of NSW 17-year-olds. Use R Commander for all calculations but write your answers according to the 5-step method. (5 marks)

Question 3 (4 marks)

A researcher is questioning whether or not the introduction of new laws intended to limit the emission ofpolycyclic aromatic hydrocarbons (PAH)in the gasses emitted from aluminium smelting plants have been effective. She has compiled emission measures for a random sample of six aluminium smelters. For each smelter she has recorded emissions at one year before and two years after the introduction of the new legislation. The PAH concentrations are continuous variables. Results are shown in the following table.

Smelter | PAH concentration prior to introduction of new laws | PAH concentration after introduction of new laws |

A | 103 | 21 |

B | 27 | 19 |

C | 407 | 320 |

D | 221 | 47 |

E | 7,230 | 550 |

F | 339 | 28 |

a. The researcher wishes to test her hypothesis that the concentration of PAH in gaseous emissions from aluminium smelters have decreased since the introduction of the new laws. Is this a one-sided or a two-sided hypothesis test? Explain why. (1 mark)

b. Name an appropriate statistical test to address this hypothesis (that the concentration of PAH in gaseous emissions from aluminium smelters had decreased since the introduction of the new laws). Justify your choice of test. DO NOT perform any analysis. (3 marks)

Question 4 (9 marks) Note: Each student will get different answers as the data sets differ.

Research question: Does mode of transport differ by gender in the population of NSW 17-year-olds?

Use the assignment data set assigned to you: Variables to analyse: ‘licence’ and ‘sex’

a. Show the relationship between driver’s licence status and gender in the sample of NSW 17-year-olds using a two-way contingency table. Include either row or column percentages. Type and label the table yourself: an R Commander screenshot will not be accepted. (2 mark)

b. Looking at the results in part a) only, is there any evidence of association between gender and licence status in this *sample* of NSW 17-year-olds? Explain why or why not. (2 marks)

c. Are the requirements for a Chi-square test met? Explain why. (1 mark)

d. Irrespective of your answer in part c) address the research question using a Chi-square test on the provided data. Please use R Commander but format your answer according to the 5 step method. (4 marks)

Question 5 (5 marks)

a. Give one reason why different research studies require different sample sizes. Why not use the same sample size for every research study? (1 mark)

b. Dr Smith asks you to estimate the minimum sample size required to detect a difference of 0.5 hour in mean self-reported sedentary hours per week between 17-year-old NSW boys and girls withand power=0.90. (He suggests, this 0.5 hour difference could, for example, be a mean of 9.5 hours compared to a mean of 10 hours.) He is confident from his previous reading that the population standard deviation is and he wishes to use equal group sizes for maximum efficiency. Estimate the minimum sample size required for Dr Smith’s study. Present your answer to Dr Smith as a sentence which summarises the required sample size to achieve what power subject to what conditions. (3 marks)

c. Suppose despite the answer in part b. Dr Smith decided to run his study with a sample size of n=20 per group (n=40 in total). What impact would this have on the project’s ability to answer the research question? (1 mark)