In Chapter 7: Visualize Your Process Behavior, the author introduces a tool called the XmR chart. XmR chart is used to detect assignable causes/ unpredictable events/ events that are outside routine variations. With that purpose in mind, the XmR chart includes two limits, Upper Nature Process Limit (UNPL) and Lower Natural Process Limit (LNPL). Any observation that exceeds the two limits is considered to be a surprise. However, it is not clear at that time why the XmR works and where those coefficients in the formula come from.
In Chapter 11, the author tries to explain the XmR chart. Unfortunately, I feel the author did not give enough details in his explanation. At the beginning of the Chapter, he mentioned that readers can actually skip this chapter. So, I know that I cannot expect too much from this chapter. But it is fine. No single book can answer all our questions. The author does point readers to other resources, such as the origin of the XmR chart: Economic Control of Quality Of Manufactured Product. I will try to read relevant chapters in the book later and find out the theory behind the XmR chart. In this blog post, I will first summarize what I get from this Chapter.
The logic starts with assuming the process is predictable. If it is predictable, then we can predict future observations by using summary statistics (the limits). If the observations are consistent with the predictions, the process may be predictable. If it is not, the process is unpredictable. The author also points out that the confidence of detecting unpredictable process is higher than that of the predictable process since the limits are calculated using average +/- 3 Sigma (will explain in the following paragraphs).
Let me first clarify that Sigma is not standard deviation although it is common to use in that way. And let us first define some notations:
I think that in the Chapter, the author did not explain the origin of sigma and why the formula represents dispersion in data, and how and why average +/- 3 sigma is a good and theoretically sound limit to distinguish predictable and unpredictable event. Maybe I have to find the answer in the original book about the XmR chart.
The author, however, mentions some nice properties of the 3 sigma limit. First, we do not have to assume a normal distribution for it to work. It works in different kind of probability models. The 3 sigma limit can distinguish unpredictable events. However, the author just gives visual proof instead of mathematical proof. The author then addresses the problem with a heavily skewed probability model. Will it affect the use of lower/ upper bound? Theoretically yes, but practically no is the author’s answer. Since in reality, the reason something is skewed may due to some natural limitation such as stock price cannot go below zero. The natural limit already is a good bound.
A short session is dedicated to explaining why standard deviation is not a good measure of dispersion. The answer is that dispersion computed by standard deviation is a global measure. That’s the reason why the moving range is first calculated in the XmR chart. Then the average moving range is used to compute the limits. The formulas use the short term variation to place limits on the long term variability.
Finally, the Chapter talks about Chunky Data. I will rephrase it as variability is affected by scale. Imagine if height is measured by meters, the variability of height will be small. So, picking the right unit of measurement is essential. To detect whether the unit of measurement is problematic, there is a rule of thumb: if possible incremental steps between 0 and upper range limits (URL) is less than 4, you may consider adjusting the scale. For example, if the unit is meter, and the URL is 2, it takes 3 steps to hit the URL (0,1,2). Then meter is not a good scale to use in the XmR chart.
Next, I will have a look on the origin book and try to persuade myself on the rationale behind XmR chart. Meanwhile, I will try to use XmR chart in some of my work to test it.