Page 1 of 1
Tutorial – Week 3 (MVN)
Question 1:
a) Are two variables that are MVN distributed and with a covariance of zero independent?
b) Are two variables that are correlated and individually normally distributed necessarily
MVN?
c) What are the four main properties of the MVN distribution?
Question 2:
Using the data file ‘emdecade.dat’ and the 3 variables rain, maxt and mint:
a) Describe the data set. What potential issues could arise from analysing the whole data
set?
b) Produce the correlation matrix between these 3 variables. What does it suggest about
the potential for the data to meet the MVN assumption?
c) Produce and interpret normal QQplots for each of the variables.
d) Calculate the Mahalanobis distances for each decade. Comment.
e) Calculate the chi-squared quantile probabilities and create a chi-squared QQplot for the
Mahalanobis distances. Interpret.
f) Produce the chi-squared QQplot using the MVN package. How is this plot slightly
different to the one produced in part d)? Why is this the case?
g) Use the MVN package to identify any multivariate outliers. Use the adjusted quantile
method. Explain the results in the outlier plot.
Question 3:
The Iris dataset [1] is a default dataset installed with R. Load and view this data set using
the following R code:
> data(iris)
> iris
The data consists of measurements of 4 flower parts: sepal length, sepal width, petal length
and petal width. Each of the 150 flowers measured was from one of three species. We will
work with the data from the species virginica in rows 101 to 150. To isolate the 4 variables
for this species use:
> virginica > virginica
Using the MVN package in R:
a) Produce univariate QQplots and histograms for each variable and perform Shapiro-Wilk’s
Normality tests. Interpret.
b) Produce a perspective and contour plot for the sepal variables and then for the petal
variables. Comment.
c) Test for MVN using Mardis’s, Henze-Zirkler’s and Royston’s statistics. Also produce a
Mahalanobis chi-squared QQplot. Interpret all results. And in conjunction with the
univariate analyses give your overall judgement on whether the data meets
assumptions of MVN.
d) Recreate the QQplot by calculating the distances and probabilities.
Reference:
[1] R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of
Eugenics, 7(2):179–188, 1936.