What is confunding bias?
Imagine you are a farmer trying to determine applying fertilizer A to the field will increase the yield of crops. In other words, you are trying to establish a causal relationship between fertilizer A and yield. Our tricky Nature tells you that the effect of the fertilizer is mixed with a variety of other causes, such as drainage, weather, soil fertility, microflora, and many others. That means those factors also affect the choice of fertilizer itself. The farmer may choose Fertilizer A based on drainage for example. Those other factors are called confounders.
Some researchers like to control for many confounders: “we controlled for a, b, c, d, e, f….”. It seems that the more factors they controlled for, the better. This result comes from researchers worrying that they have missed a confounding factor. But no matter how hard you tried, how can you be sure there are no other confounders? And sometimes, you even controlled for something you want to measure.
What we want to know?
We want to “controlled” for other factors that affect the choice of fertilizer A. Given drainage, weather, soil fertility, microflora, and other factors are the same, will applying fertilizer A increase yield?
In the above diagram, applying fertilizer A does not depend on any external factors that may potentially affect yield as well. But we need a trigger to apply fertilizer A as well. And the trigger cannot depend on any external factors.
Randomized Controlled Trial (RCT)
The natural candidate for that is randomization. You will apply fertilizer A based on some random card. It eliminates all confounders.
All arrows pointing to fertilizer A had been erased and there is no arrow between random card and yield. We have deconfounded the confounders.
One more advantage of RCT is it eliminates the confounders that we don’t know about or cannot measure. But sometimes, intervention is impossible, for example, we cannot randomly assign patients to be obese or not.