Last update:


Arch Hellen Med, 26(3), May-June 2009, 407-422


Multivariate analysis of epidemiological data

Center for Health Services Management and Evaluation, Department of Nursing, University of Athens, Athens, Greece

In epidemiology, mathematical models are used for various purposes. The two primary purposes are prediction and control for confounding. Prediction models are used to estimate risk based on information from risk predictors. In contrast to the goal of risk prediction for specific individuals, much epidemiologic research is aimed at learning about the causal role of specific characteristics (or determinants) for disease. In causal research, multivariate mathematical models are used to evaluate the causal role of one or more characteristics while simultaneously controlling for possible confounding effects of other characteristics. In a multivariate model, the inclusion of several variates results in each item being unconfounded by the other items. This is an easy and efficient approach to controlling confounding by several variates at once, something that might be difficult to achieve through a stratified analysis. However, with stratified analysis, both the investigator and the readers (when the stratified data are presented in a paper) are aware of the distribution of the data according to the key study variates. For this reason, a multivariate analysis should be used as a supplement to a stratified analysis, rather than as the primary analytical tool. In epidemiology, the most frequently used models are the general linear model and the logistic regression model. The outcome (or dependent variate) in the general linear model is continuous, while in logistic regression it is the indicator variate.

Key words: Logistic regression, Multivariate analysis, Regression, Stratified analysis.

© Archives of Hellenic Medicine