date: 6 Dec 2020

I will share what I know and some thoughts on developing credit risk model.

## Expected Loss

The reason why expected loss is calculated is to determine how much capital a bank/Financial Institution (FI) should hold. The expected loss is not the only factor in determining capital requirements. There are other factors, like Value at Risk (VaR), and Unexpected Loss. I will not cover them in this article. You may refer to this document for more information.

Expected Loss is a function of three components, namely the probability of default (PD), exposure at default (EAD), and loss given default (LGD). It should be noted that a time period should be taken into account, for example, PD usually means the average percentage of obligors that default in the course of one year. EAD estimates the outstanding amount in case the borrower defaults. LGD gives the percentage of exposure the FI might lose in case the borrower defaults, usually a percentage of EDA.

### Three Basel Approaches for Expected Loss

Standardized Approach (SA) | Foundation Internal Ratings-Based Approach (F-IRB) | Advanced Internal Ratings-Based Approach (A-IRB) | |

PD | External Rating Agencies | Own Model | Own Model |

LGD | External Rating Agencies | Regulator’s prescribed LGD | Own Model |

EAD | External Rating Agencies | Own Model | Own Model |

The above table clearly summarizes the difference between the three approaches. The key pattern is whether to use external or internal models.

To successfully develop internal models, a certain amount of data is required. The amount of data is determined by both how long the product has launched (how many clients), and the types of clients, for example, the number of corporate clients will be less than individuals. Therefore, it is common that the standardized approach will be used for new products or types of clients which have less data point.

## The Chicken and Egg Problem of Statistical Credit Risk Model

To estimate the probability of default using internal data, we need to first have a default event, such as overdue 90 days. Default event is impossible to happen if the product hasn’t launched yet. I always heard in my company that we need to build a scorecard/ risk model for X product, Y product. But, how can we build an internal risk model specifically for X if the product hasn’t launched?

To use F-IRB or A-IRB approach, we need to accumulate enough data and we need to have default events. The only possible way to develop a risk model for completely new products is the standardized approach or reuse models from other products.

## Typical Procedures of Development Internal Model for Probability of Default

- Discuss with domain experts on what kind of parameters to be included in the model, such as age, annual income, occupation, etc.
- Access the data availability on whether the parameters can be calculated or drawn from the internal database.
- Define the default event. There are many possible definitions of default. It should be fixed before we start developing the model.
- Discuss the evaluation matrices of the model in order to determine whether the model is good or not.
- Build the model. This is actually the simplest part.
- Output summary statistics for evaluation.

I will not discuss the statistical method used in developing the model since the focus of this article is not on statistics. Rather, I want to point out that Step 1,2,3,4 require a lot of discussions.

My personal experience is that domain experts will usually come up with a lot of parameters to be included. But they usually overlooked the data availability problem. For example, domain experts required items in the cash flow statement. However, for SMEs in Hong Kong, they are not required to prepare cash flow statements. Apart from that, due to poor data governance, some data points may not be captured in the database. Then, the project team needs to evaluate whether it is worth manually inputting the data points.

There are also some key assumptions in the success of a credit risk model. First, a statistical model requires a lot of data to develop. Having only 100 clients are not enough to have a statically significant result. A good performance of the model may simply due to chance. Second, the more parameters domain expert included in the model, the more data (entry) will need. The typical ratio is 1:10. That means if we have 10 features, we need to have at least 100 entries. But, the reality is we sometimes started with more than 100 features. Third, as I have mentioned before, to estimate the PD, we need to first have a default event since we are using a statistical model instead of a deterministic model.

## Thoughts and Reflections

I think we need to be realistic in deciding whether to use external or internal models. Internal models do give us a more accurate estimate on PD, EDA, LGD and hence better risk management (lower capital requirements haha?). We need to understand the assumptions.

If the probability of default is hard to estimate due to data availability or probability of default is overestimated due to using external agencies, can we control the EDA by lowering exposure or LGD by having better facility structures?

Finally, data governance is vital. If data is not captured, there is no way we can develop our internal models. I believe that data governance, not limited to capturing data, should be considered in the product development stage. This provides the foundation to build our own models.