Understanding Mann-Whitney U Test STATISTICAL TECHNIQUE IN REVIEW
The Mann-Whitney U test is a nonparametric statistical technique used to detect differences between two independent samples. This statistical technique is the most powerful of the nonparametric tests, with 95% of the power of the t -test. If the assumptions for the t -test cannot be satisfied, i.e., ordinal-level data are collected or the distribution of scores is not normal or is skewed, then the Mann-Whitney U is often computed. Exercise 12 provides an algorithm that will assist you in determining if the Mann-Whitney U was the appropriate statistical technique included in a published study ( Grove, Burns, & Gray, 2013 ; Plichta & Kelvin, 2013 ). The Mann-Whitney U tests the null hypothesis: There is no difference between two samples for a selected variable. For example, there is no difference between men and women regarding their self-care activities following surgery. In this example, self-care activities data need to be measured at least at the ordinal level.
To calculate the value of U , scores of both samples are combined, and each score is assigned a rank. The lowest score is ranked 1, the next score is ranked 2, and so forth until all scores are ranked, regardless from which sample the score was obtained. The idea is that if two distributions came from the same population, the average of the ranks of their scores would be equal as well. Plichta and Kelvin (2013 , pp. 111–117) provide the details for calculating the Mann-Whitney U test. RESEARCH ARTICLE Source Eckerblad, J., Tödt, K., Jakobsson, P., Unosson, M., Skargren, E., Kentsson, M., & Thean-der, K. (2014). Symptom burden in stable COPD patients with moderate or severe airfl ow limitation. Heart & Lung, 43 (4), 351–357. Introduction Eckerblad and colleagues (2014 , p. 351) conducted a comparative descriptive study to describe the symptoms of “patients with stable chronic obstructive pulmonary disease (COPD) and determine whether symptom experience differed between patients with mod-erate or severe airfl ow limitations.” Table 1 from this study is included in Exercise 6 and provides a description of the sample characteristics. The Memorial Symptom Assessment Scale (MSAS) was used to measure the physical and psychological symptoms of 42 out-patients with moderate airfl ow limitations and 49 patients with severe airfl ow limitations. The results indicate that the mean number of symptoms was 7.9 ( ± 4.3) for the sample of 0
EXERCISE 21 • Understanding Mann-Whitney U Test Copyright © 2017, Elsevier Inc. All rights reserved. 91 patients with COPD. The Mann-Whitney U analysis technique was conducted to deter-mine physical and psychological symptom differences between the moderate and severe airfl ow limitation groups. The researchers concluded that patients with moderate and severe airfl ow limitations experienced multiple severe symptoms that caused high levels of distress. Quality assessment of COPD patients ‘ physical and psychological symptoms might improve the management of their symptoms. Eckerblad, J., Tödt, K., Jakobsson, P, Unosson, M., Skargren, E., Kentsson, M., & Theander, K. (2014). Symptom burden in stable COPD patients with moderate or severe airflow limitation. Heart & Lung, 43 (4), p. 354. TABLE 3 COMPARISON OF PHYSICAL AND PSYCHOLOGICAL MSAS SYMPTOM BURDEN SCORE BETWEEN PATIENTS WITH MODERATE AND SEVERE AIRFLOW LIMITATION MSAS Symptom Burden Score Moderate n = 42 Mean ( SD ) Severe n = 49 Mean ( SD ) p Value Physical Symptoms Shortness of breath2.12 ± (1.09)2.58 ± (0.90)0.02 Cough1.56 ± (1.16)1.24 ± (1.06)0.22 Dry mouth1.38 ± (1.42)1.81 ± (1.27)0.17 Lack of energy1.01 ± (1.22)1.53 ± (1.37)0.10 Feeling drowsy0.82 ± (1.04)1.07 ± (1.12)0.32 Pain1.35 ± (1.46)0.99 ± (1.31)0.20 Numbness/tingling in hands/feet0.91 ± (1.16)0.53 ± (1.03)0.07 Feeling bloated0.56 ± (1.22)0.49 ± (1.17)0.78 Dizziness0.49 ± (0.88)0.74 ± (1.20)0.43 Sweats0.47 ± (0.96)0.65 ± (1.06)0.38 Psychological Symptoms Difficulty sleeping1.26 ± (1.44)1.44 ± (1.43)0.54 Worrying0.73 ± (0.99)0.71 ± (1.17)0.68 Feeling irritable0.55 ± (1.00)0.52 ± (0.94)0.88 The MSAS symptom burden score is the average of the frequency, severity, and distress associated with each symptom. Analyses performed with the Mann-Whitney U -test. Only symptoms reported by ≥ 25% of the patients are included in this table. Relevant Study Results “MSAS evaluates a multidimensional symptom profile, including the prevalence of 32 symptoms and the symptom experience of 26 physical and six psychological symptoms during the previous week. Symptom prevalence was recorded as yes = 1 or no = 0. When-ever a symptom was present, the symptom experience was assessed by the dimensions of frequency (1 = rarely to 4 = almost constantly) for 24 symptoms, severity (1 = slight to 4 = very severe), and distress (0.8 = not at all to 4.0 = very much) associated with all 32 symptoms during the preceding week. Higher scores indicate greater frequency, severity, and distress.
A score for each symptom, defined as an MSAS symptom burden score, was calculated by averaging the scores for frequency, severity and distress dimensions” ( Eckerblad et al., 2014 , p. 352) . “The six symptoms rated with the highest MSAS symptom burden score, mean ( ± SD ), in the whole sample were shortness of breath, 2.4 ( ± 1.0); dry mouth, 1.6 ( ± 1.4); cough, 1.4 ( ± 1.1); sleep problems, 1.4 ( ± 1.4); lack of energy, 1.3 ( ± 1.3); and pain, 1.2 ( ± 1.4).
The mean ( ± SD ) MSAS symptom burden scores for moderate and severe airflow limitations are presented in Table 3 . Patients with moderate airflow limitation had significantly lower MSAS symptom burden scores for shortness of breath than patients with severe airflow limitation ( Table 3 )” ( Eckerblad et al., 2014 , p. 353).
The MSAS is a Likert-type scale that measured symptom frequency and severity on scales of 1 to 4 (with 1 being low and 4 being high) and distress on a scale of 0.8 = not at all to 4.0 = very much (see previous discussion of the scale). The frequency, severity, and distress scores were summed to obtain the MSAS symptom burden scores. Likert scale data are often considered interval-level data, especially if the data are summed to obtain total scores. These summed scores are analyzed with parametric analysis techniques if the scores are normally distributed, not skewed. However, Eckerblad et al. (2014 , p. 353) indicated that the “MSAS symptom scores are presented as the mean ( SD ) for comparison with previous research, although the data are skewed.” Since the data collected with the MSAS are skewed or not normally distributed, the nonparametric Mann-Whitney U test was calculated to determine differences between the moderate and severe airflow limitation groups for reported physical and psychological symptoms (see Table 3 ), using a significance level of p ≤ 0.05.