After looking at how statistics pay little attention on causation compared with correlation, the author talks about Bayesian Network, in which he is one of the main contributors, and its relation with causal diagram.

# Bayes Theorem and Inverse Probability

Bayes theorem provides a mathematical description of two events, the hypothesis (D) occurring before the evidence (T), and the evidence (T) occurring before the hypothesis (D). It can be written as: P(D|T)P(T) = P(T|D)P(D). Provided that we know P(T) and P(D), we can estimate the conditional probability in one direction, which usually is easier, by another direction.

For example, if a woman receives a positive result from a breast cancer test, how likely that she has the cancer. Now, D, means having the disease, and T, means test result is positive. What we want to know is P(D|T). To find it, we use Bayes Theorem: P(D|T) = P(T|D)/P(T) * P(D). Since we have the sensitivity of the test which is given by P(T|D) [given someone has the cancer, how likely the test is positive], we can compute P(D|T) if we know P(T), P(D).

Total probability theorem can be used to get P(T), which is just the weighted average of two possible situations a patient can get a positive test results: P(T|D), P(T|~D). The second one is the false positive rate which is also known. Let me give you a concrete numerical example. Given that:

• Sensitivity of the test: 73%
• False positive rate: 12%
• 1/700 woman has breast cancer

Then P(D|T) = {0.73/[(1/700)*0.73 + (699/700) * 0.12]} * P(D)

P(D) is our prior belief on how likely a woman will get breast cancer. P(T|D)/P(T) is the likelihood ratio. P(D|T) is called the updated probability of D, given we have a new information T.

Here, we did not give P(D) because it is different from each person. If you are immune (just imagine), no matter what is the test result, P(D|T) = 0. If your family medical history is of higher risk, your P(D|T) will be higher.

# How Bayes Theorem Relates to Bayesian Network

A Bayesian Network is organized like a network with nodes and connections. But the network is hierarchical with parent nodes and child nodes. In the case of cancer test, Disease is the parent node, and Test is the child node. And they starts with a prior probability of having cancer (1/700) and (699/700). And based on the numeric example given, we have the following conditional probability table:

If the information is passed from parent to child, for example, knowing that someone has the disease. The child will update its belief using conditional probability [P'(T) = P(T|D)]. If the information is passed from the child to parent, for example, knowing that someone’s test is positive, the parent will update its belief by multiplying the likelihood ratio with the prior probability. This is called belief propagation.

In the following part, I will talk about patterns of arrows in Bayesian Network and how Bayesian Network can be constructed as casual diagram.