132 A Note on Causality

There are several misconceptions about the use of Hypothesis Testing and its relationship to causality. Many textbooks correctly warn that association alone is insufficient for causal claims. Statements such as “correlation does not imply causation” are useful reminders, but they are often interpreted too rigidly. The key point is this: causal conclusions depend primarily on study design and identification assumptions, not on the specific test statistic alone.

The concept of causality has been studied by philosophers, mathematicians, lawyers, and many other professions. It has different meanings, depending on the situation and the field of study.

So let us try and formulate a practical definition of what causality is, from an intuitive point of view. To do this we will use a time series notation and only consider two variables: the cause \(X_t\) and the effect \(Y_t\).

In order to conclude that \(X_{t1}\) causes \(Y_{t2}\) there are three conditions that must be met:

\(X_{t1}\) and \(Y_{t2}\) need to have some sort of covariation. This means that when \(X_{t1}\) changes, \(Y_{t2}\) needs to change too. This relationship can be a linear one (e.g. \(\rho(X_{t1}, Y_{t2}) \neq 0\)) but this is not necessary (non-linear relationships are possible but more difficult to investigate).
The cause needs to precede the effect. In other words, \(t1 \leq t2\). This seems to be a logical condition but there are situations where this is not so clear. Suppose that a company plans to put its products on sale and communicates this through advertising. The “effect” \(Y_{t2}\) might change as soon as the announcement is made even though the “cause” \(X_{t1}\) lies in the future (i.e. \(t1 > t2\)). This type of anticipation effect (where customers postpone their consumption because of an anticipated drop in prices) should lead us to re-formulate the “real cause” as the sales announcement, not the actual price drop.
Given all other things being equal, the change in \(Y_{t2}\) should disappear as soon as the cause \(X_{t1}\) stops changing. This implies that the effect of \(X\) on \(Y\) cannot be confused with the impact of another, yet unobserved, variable \(Z\).

The first condition can be tested with classical Hypothesis Tests (as shown in the previous sections). The second condition is either controlled (e.g. in an experiment) or assumed (when we use secondary data sources) -- it is often easy to verify whether this condition is met or not. The third condition, however, is the one which is most problematic and often leads to confusion.

In order to exclude the possibility that a third (unobserved) variable is confounding the measured covariation between the cause and effect, we need to randomize the cause within the framework of an experiment. For instance, if we divide patients into two groups (each group is a simple random sample) then we can administer the experimental drug to one group (the treatment group) and provide a placebo to the other group (the control group)¹. Since the individuals are randomly assigned to either group, the only variable which systematically differs in both groups is the treatment. All other, possibly confounding, variables will (given sufficiently large sample sizes) be evenly distributed among the groups, implying that they cannot distort the covariation that is measured.

Without going into details about experimental design (this is far beyond the scope of this book) we simply need to understand that the interpretation of Hypothesis Tests (i.e. whether the test allows us to conclude causality or not) is not dependent on the statistical method or test that is used. It is simply a matter of the origin of the data under analysis. When analyzing data from a randomized experiment, one can (almost) always draw conclusions in terms of causality. When analyzing data from secondary sources (e.g. statistical databases) one can usually only infer “associations” or “covariations” between variables.

In order to avoid psychological effects the patients do not know whether they belong to the treatment group or the control group.↩︎