In the previous post, I have talked about the Monty Hall Problem. Let’s go through a similar problem called Berkson’s Paradox/ Admission Rate Bias.

In a medical study, a guy studied two groups of diseases: respiratory and bone. In the general population, about 7.5% of the people have a bone disease but it does not depend on whether they have respiratory disease. In other words, whether they have respiratory disease would not affect the likelihood of getting the bone disease. However, in hospitals, for people with respiratory disease, the frequency of bone disease jumps to 25%.

One possible reason is just like the Monty Hall Problem’s one. That is the collider bias. Given a person is in hospital and has disease 1, it is more likely for him to have disease 2 although disease 1 and 2 are not related. Given the host opened door 3 and you opened door 1, it is more likely that the car is behind door 2 although which door you choose and which door has the car behind it are not related. Although there is one more possibility, if disease 1 and 2 have some common cause, we will have confounder bias too.

This problem exists in everyday life. For example, we often heard people say the attractive ones tend to be jerks. The sad truth is that you will most likely not date unattractive and mean people. So the data/ experience you collected would not contain unattractive and mean people. So, the chance of finding attractive and mean people are greater. If the researcher chooses a hospital to do his research, then he may not realise the bias and thinks that respiratory disease and bone disease is highly correlated.

Some people will take this argument to the extreme and claim that “No correlation without causation”. To rebut this argument, let’s consider an experiment.

Flip two coins simultaneously one hundred times and write down the results only when at least one of them comes up heads.

You will find that every time Coin 1 is Tail, Coin 2 is Head since you will not record the last row where both of them landed Tail. The outcome of Coin 1 and Coin 2 are correlated but they do not have a common cause. It is completely random when we flip the coin. It is just the data generation process, i.e. the fact that we choose not to record the last row in the table, that makes the outcome correlated.