Date: 30 May 2021

I am currently reading a book called *The Book of Why*. I just finished Chapter 1, The Ladder of Causation, and would like to give you a quick summary on what I have learnt.

In the recent advancement and success of machine learning and artificial intelligence, it seems that many problems can be solved by learning from a large amount of data, from credit risk assessment, Go game, to protein folding problem. The author claims that the development is far from human intelligence since a human can answer questions much more difficult than association/ correlation.

In this Chapter, the author introduces the so called Ladder of Causation:

Category | Fancy Name | Activity | Questions |

Seeing | Association | Seeing, Observing | What if I see…? How are the variables related? |

Doing | Intervention | Doing, Intervening | What if I do…? How? What would Y be if I do X? |

Imagining | Counterfactual | Imaging Retrospection Understanding | What if I had done…? Why? Was it X caused Y> |

The concepts may be abstract. I think it is best illustrated by an example, of which the author called it mini-turing test (I will not go into describing why it is called mini-turing test and how is it related to turing test).

Suppose that a prisoner is about to be executed by a firing squad (see the picture if you don’t know what is a firing squad). A certain chain of events must occur for this to happen.

1. The court orders the execution

2. The order goes to a captain, who signals the soldiers on the firing squad (A soldier and B soldier) to fire. Assumed that the soldiers obey orders.

3. If either one of them shoots, the prisoner dies.

The chain of events can be represented by a diagram. And we can start answering problems in different level of the ladder.

Ladder | Question | Diagram | Answer |

Association | If the prisoner is dead, does that mean the court order was given? OR If A shoots, did B shoot? | 1 | Yes |

Intervention | What if Soldier A decides on his own initiative to fires, will the prisoner be dead or alive? Did B also shoot?The trick here is that the question violates the assumption that soldiers obey orders. A computer won’t be able to answer the question since it violates the rules. That’s the inflexibility of association.Here, A shoot does not imply B shoot. Association fails to answer the question. | 2 | Yes |

Counterfactual | Now, imagine, the prisoner is dead, and A, B must have shot. What if A decided not to shoot? Would the prisoner be alive? (Counterfactual means counter the fact. It means imagining hypothetical situation) Association will not solve the question since it may never happen. There is no data on that. | 3 | Yes |

So, to pass the mini-turing test, our computer must be able to break rules (assumption that soldier will obey orders) and imagining fictitious world (what if A decided not to shoot which actually didn’t happen).

Apart from the arrow (cause-effect relationship), probabilities are important as well. The vaccine is a hot topic today and let’s use it as an example to illustrate the importance of probabilities. Imagine we have the following numbers:

- Out of 1 million children, 990,000 get vaccinated (99%)
- 9,900 have adverse reaction (1% of the 99%)
- 99 die from it (1% of the 1% of the 99%)
- The 10,000 who are not vaccinated, 200 get smallpox (2%)
- 40 die from the disease (20%)

Someone will say vaccine kills more people than the disease (99 die from vaccine and only 40 die from the disease). Without probabilities, it is hard to judge whether vaccine is beneficial or not. But with probabilities, we can imagine what if no one get vaccinated, there will be 4,000 children die from smallpox.

But can causation reducible to probabilities? For example, can we say X causes Y if X raises the probability of Y? It is “incorrectly” represented as follow: P(Y|X)>P(Y). This representation is wrong since it is just an observation that means if we see X, the probability of Y increases. It is still correlation/association. Sometimes, people try to define causation by probability like the following: P(Y|X, K=k) > P(Y|K=k) where k is some background factors. But what variables need to be included in the background set K and conditioned on? Maybe we should not try to define causation but rather treats it like axioms, like point and line in geometry.

In summary, the author said, “while probabilities encode our beliefs about a static world, causality tells us whether and how probabilities change when the world changes, be it by intervention or by act of imagination.”