Stats Made Simple: Bias and Confounding in research

Bias

In research, bias is defined as a systematic deviation of the estimated effect from the truth. Bias threatens internal validity, because the observed association between exposure and outcome does not fully reflect the underlying effect 1.

Types of Bias

Several types of bias exist, but the most frequent ones are selection bias and information (measurement) bias.

  • Selection bias occurs when individuals, groups, or data are chosen for analysis in a non-random manner. This can lead to inaccurate results because the sample may not represent the target population. For example, in studies requiring active participation, barriers such as socioeconomic factors or limited access to resources may prevent certain segments of the population from enrolling, leading to systematic differences between participants and non-participants.
  • Information (measurement) bias arises from systematic error in the measurement or classification of exposure, outcome, or covariates. For instance, when subjects are asked to recall past exposures or experiences, recall bias may occur if individuals inaccurately report previous events 2.

Confounding

Confounding occurs when a third variable influences both the exposure and the outcome, producing a distorted or misleading association between them. When confounding is present, the estimated effect of the exposure may be exaggerated, attenuated, or entirely spurious5.

Figure 1. A confounder influences both the exposure and the outcome, potentially distorting the observed association. Without appropriate control, part or all of the relationship between exposure and outcome may be biased.

Methods to Estimate Confounding for Measured Variables

When confounders are known and collected, analytical techniques can be used to adjust for their effects. Among the most widely used methods is multivariable regression, which includes confounders as covariates within the model. This approach allows simultaneous adjustment for multiple variables and provides effect estimates that account for measured confounding6.

Methods to Estimate Confounding for Unmeasured Variables

When potential confounders are not measured or are unknown, sensitivity analyses are required to evaluate how much they might influence the observed association. A recent and increasingly used method is the E-value, which quantifies the minimum strength of association that an unmeasured confounder would need to have with both the exposure and the outcome to fully explain away the observed effect. This measure offers a simple and intuitive assessment of robustness against unmeasured confounding7.

Methods to Avoid or Minimise Confounding

Several design strategies can be applied before or during data collection to prevent confounding:

  • Randomisation: In experimental studies, random allocation distributes both known and unknown confounders evenly across groups, minimising confounding at the design stage.
  • Restriction: Limiting enrollment to a specific subgroup (for example, only non-smokers) eliminates variation in selected confounders, though at the cost of reduced generalizability.
  • Matching: Pairing or grouping participants with similar baseline characteristics ensures that confounders are balanced between exposure categories8.

EULAR Points to Consider

Given the methodological challenges described above, referring to the EULAR Points to Consider provides a structured framework to guide the appropriate analysis and reporting of observational studies in rheumatology research 9,10.

Giovanni Fulvio
on behalf of the EMEUNET Newsletter Sub-Committee

Leave a Reply