Correlation is not necessarily cause

A common mistake in trying to think scientifically but failing to live up to the standards is: taking a correlation for a cause. Only because something is reported or seen in roughly the same time as another phenomenon does not make them causally connected – only correlated. In the broadest sense correlation is any statistical association, whether causal or not, between two random variables. Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. But they do not imply that the relationship has to be causal.

This dictum should not be taken to mean that correlations cannot indicate the potential existence of causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown, and high correlations also overlap with identity relations (e.g. tautologies), where no causal process exists. Consequently, a correlation between two variables is not a sufficient condition to establish a causal relationship (in either direction).

A few examples: A correlation between age and height in children is fairly causally transparent, but a correlation between mood and health in people is less so. Does improved mood lead to improved health, or does good health lead to good mood, or both? Or does some other factor underlie both? In other words, a correlation can be taken as evidence for a possible causal relationship, but cannot indicate what the causal relationship, if any, might be…

From the fact that Santa Clauses can often be found in Western countries at Christmas time, cannot be concluded that Santa Clauses are the cause of Christmas. Likewise, the direct causality can be completely absent despite a correlation statement: There can be a correlation between the decline in storks in the countryside and a decline in the number of newborns, but these events have nothing to do with each other. Storks do not bring children, and vice versa. This means that causally they have something to do with each other via a third variable (what one would call a sham correlation), for example via urbanization, which both destroys nesting sites and promotes small families.

In times of a pandemic and scientific discourse trying to weed out the causal correlations with statistical data of and a lot of correlations, this might get confusing for the untrained mind, and tempting. An abused correlation might be taken or abused as a causal relation and then reported to a certain uncritical audience as a “proof”. A scientifically trained mind has to stand up against improper interpretations, as this scientific instrument “correlation” is at the base of scientific work. I had to deal a lot with different measures from live brains while doing brain research with EEG in the Center of Brain Research in Vienna – trying to find a causal relation between brain data (measurements of brain waves) and personal experience (detection or retention of perception & connected concepts in the mind). Correlations were our only weapon then, the math behind intimidating. It is important to note that findings in the fields of the Cognitive Neurosciences are based on correlations but not a proof of a causal relation. If we would have proof, we could say: We have understood the brain – because of x, y is happening…that is the cause of the mind. It is not that simple, unfortunately. We usually search in large data sets (acquired in controlled experimental situations) for a clear over-random correlations. Then we try to interpret soundly what kind of distribution they have – which is often expressed as a correlation coefficient between -1 and +1. In additon there are many ways to do correlations and different methods estimate their predictive value or accuracy and error margins – all with their pros & cons.

(Examples of the correlation of x and y for various distributions of (x,y) pairs. A translated version in R of an Matematica 6 code by Denis Boigelot. Source:

The fallacy of correlation to causality is also known as cum hoc ergo propter hoc (Latin for ‘with this, therefore because of this’). In order to really establish causalities and to be able to define directions of causality, a substance-scientific consideration is fundamentally necessary. In this case, the question “why does noise in the house have a negative effect on the intelligence of children?” Can only be explained by groups of people with the appropriate specialist knowledge, such as psychologists and environmental scientists.

As with any logical fallacy, identifying that the reasoning behind an argument is flawed does not necessarily imply that the resulting conclusion is false. Statistical methods have been proposed that use correlation as the basis for hypothesis tests for causality, including the Granger causality test and convergent cross mapping.

This may concern the reader and society at large during the Covid 19 pandemic. We are all confronted with scientific conclusions which often try to interpret their empirical findings with correlations. That is fine and good scientific conduct in a paper trying to describe the findings, and if possible and proven (with sound methods) they deduce a causal relation between factors. Often this is only possible with additional research and longer studies weeding out the factors – many of the papers published during the ongoing pandemic are based on data interpreted by stochastic methods, using correlations.

The impression that the scientific discourse may feel vague for the layman may stem from there. Scientists know when they have proof – more than often they have at a certain point in time only correlations to report. Good scientists know the limits of their interpretations but also have experience in reading them, so their predictions handling that data and the history of similar situations may give them the vantage point in making educated guesses – a level of perspective and approximation a normal citizen would never reach…the simply did not spend enough professional time in the matter.

Let’s take a concrete example of an abuse of outspoken correlations as a “proof” for a causal danger (of a vague sort). There is a list being said to come from the WHO and its VigilBase database of ADR (Adverse Drug Reactions) boasting the Covid 19 reports for adverse reaction reports after being vaccinated as the highest number of all vaccine campaigns so far (2.2-2,5 Mio reports filed). I tried to first verify if the list is legit – that alone turned out to be quite demanding and non conclusive in an easy way.

That may sound unsettling, I get that. But, may it stand after some scrutiny and additional research?

First, I couldn’t verify the list to see if this really came from the WHO or from Photoshop. VigiBase is only accessible on a national institutional basis with login. I didn’t find this list on any official, independent website, at least not until now. So the list itself must first be questioned, as the provenance has not been proven. But I already think through collateral research that the numbers could be not so wrong. One thing may be, that Vigibase/VigiAccess/VigiGroup/VigiFlow data seems to be managed by a WHO partner in Uppsala (The Uppsala Monitoring Centre (UMC)) and the graphics and the logo would actually be different, if its official. Here is an official overview in the direction of Covid 19 in the VigiBase evaluations, which at least confirms the order of magnitude:

A little bin the depth, however, I came across the probable origin of the conclusion and the “news”, coming from South Africa. In several places the list was forwarded to fact checking teams that did the work. It elaborates also my conclusio in more detail:



Even if the numbers turn out to be true, the VigilBase deals since 2015 with personal unverified reports and aggregates them – it clearly states that these numbers should not be treated as a causal proof or relation. See their words in their “disclaimer”:

VigiBase database spokeswoman Alexandra Hoegberg told AFP Fact Check that the figures do not mean that vaccines are harmful.

“These figures DO NOT suggest that the Covid-19 vaccines are unsafe, nor does the comparison of different pharmaceuticals provide any information about their relative safety compared to one another,” Hoegberg said in an email.

“A side effect report does not equal a real side effect.”

Hoegberg added the publicly available database shows “some very basic information about the adverse drug reactions that have been reported, but it lacks a lot of important information and context”.



Fact Checks (here another one from FACTA Italy: that WHO´s „VigiAccess data has been misused as proof that vaccines are dangerous.“ – “A side effect report does not equal a real side effect.” – “The claim that Covid-19 vaccines are unsafe is false.” Lists of ADRs (Adverse Drug Reactions) numbers are being posted on Facebook and other Social Media since September. Their origin from the System VigiBase and their contents could not be verified, but even if the numbers are accurate, a correlation on base of unverified reportings does not and shall not qualify as “cause”.

You have to take into account in these discussions that you step into the zone of “combat communication”, in which (depending on the interlocutor) every link, every means could be in play, in order to make one’s own opinion “true” or “truer”. Unfortunately, you have to be pretty well prepared to see through the teams of spin doctors with agendas around the world who may use fake news and misinterpretations or “overinterpretations” as evidence for their storytelling… As a reasonable, concerned citizen one has to have the fallacy of correlation in mind, but also that strong proof for a positive correlation AND cause may stem from numbers and additional research.

For a general overview of sound arguments for vaccinations, aggregating the current consensus, explaining facts & figures and lowering concerns, see e.g. National Health Service (UK)

your local trusted health services or this article in Der Spiegel (in German) from 17/11/2021:

Singular advice of singular doctors or self-acclaimed “experts” may starkly diverge from the official scientific consensus, therefore making any neglect, change in treatment or non-treatment a hazardous untested choice, which could endanger your life and that of your beloved ones… you have to live with the consequences – the doctors with their liability.


Scroll to Top