Last update:


Arch Hellen Med, 31(2), March-April 2014, 221-243


Univariate analysis of epidemiological data

P. Galanis
Center for Health Services Management and Evaluation, Department of Nursing, National and Kapodistrian University of Athens, Athens, Greece

Descriptive statistics are used for the concise and detailed presentation of data in epidemiological studies, while inferential statistics are applied for the investigation of relationships between determinants and outcomes. Descriptive statistics concern the presentation of epidemiological data (univariate analysis), and inferential statistics include bivariate and multivariate analysis. Univariate analysis permits the separate presentation of each variable of a study, bivariate analysis the investigation of the relationship between a determinant and an outcome and multivariate analysis the investigation of the relationship between a determinant and an outcome, taking into consideration the effect of potential confounding and modifying factors. Univariate analysis permits the presentation of absolute and relative frequencies of nominal variates and the presentation of appropriate measures of location and dispersion of quantitative variates. Particular consideration is required in the case of ordinal variates, where the presentation is feasible of both absolute and relative frequencies and measures of location and dispersion. Measures of location are the values around which the observations appear to mass to a higher degree, while measures of dispersion capture the degree to which the observations are spread out. The most important measures of location are the mean, the median and the mode. The most important measures of dispersion are the range, the interquartile range, the variance, the standard deviation and the coefficient of variation. In the case of continuous variates, univariate analysis must include a check of normality of distribution. When continuous variates follow a normal distribution, the application of parametric methods in bivariate and multivariate analysis is feasible, but when they do not follow a normal distribution the application of non-parametric methods is required. A check of normality can be made as follows: (a) comparison between the mean and the median values, (b) estimation of the coefficient of skewedness and kurtosis, (c) application of appropriate statistical tests (e.g., Kolmogorov-Smirnov, Shapiro-Wilk), and (d) use of graphs (histograms, normal Q-Q plots and box plots). Quantitative variates that follow a normal distribution should be presented as mean and standard deviation, while those that do not follow normal distribution should be presented as median and interquartile range or range. Ordinal variates, regardless of whether or not they follow a normal distribution, should be presented as median and interquartile range or range.

Key words: Data analysis, Measures of dispersion, Measures of location, Normal distribution, Univariate analysis.

© Archives of Hellenic Medicine