This is a rather short article. Recently, I am looking at the relationship between the region of user’s residential address and their GMV (How much they purchase). I found an interesting observation. If I look at all users, the higher the average income of the region, the higher the average GMV. It is very intuitive. However, if I conditioned on the users who are eligible to use our loan products, the relation is reversed. Why is that?
After reading a book called The Book of Why, my intuition tells me it should due to some kind of bias. After I draw a diagram, everything is clear.
We select users based on their application score. It is affected by both region and gmv. If we only select good users, they can either having a good gmv or live in a good region. There can be many combinations of gmv and region to achieve the same accord score, although it this case, region and gmv are somehow related. This is called collider bias or “explain-away” effect.
A typical example is the following:
Beauty and talent are not correlated in the general population. However, if we just look at celebrity, you will find talent and beauty are negatively correlated. If we are talented, we do not need to be more beautiful than the average.