In #6, I have talked about building a Minimum Viable Product (MVP). In this post, I am going to discuss the different matrices to evaluate a regression model and my choice for the problem.

NOTATIONS

• n is the number of testing samples.
• y_i is the i th predicted value.
• x_i is the i th actual value.
• k is the number of features/predictors.

MEAN ABSOLUTE ERROR

$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_i – x_i|$
• Direction of error is not considered. (The absolute sign)
• All individual differences have equal weight. (1/n)

MEAN SQUARED ERROR

$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i - x_i)^2$
• Direction of error is not considered. (Squared)
• Gives a higher weight to large errors. (Differences are squared)
• Measured in units that are the square of the target variable.

ROOT MEAN SQUARED ERROR

$RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - x_i)^2}$
• Direction of error is not considered. (Squared)
• Gives a higher weight to large errors. (Differences are squared)
• Measured in the same units as the target variable.

ROOT MEAN SQUARED LOGARITHMIC ERROR

$RMSLE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(log(1+y_i) - log(1+x_i))^2}$
• RMSLE is effectively RMSE on the log-transformed predicted and target values.
• Penalizes the underestimation of the actual value more severely than it does for the overestimation. (The shape of logarithm curve)
• Only percentual differences matter. $log(1+y_i) - log(1+x_i) = log(\frac{1+y_1}{1+x_1})$

MEAN PERCENTAGE ERROR

$MPE = \frac{100\%}{n}\sum_{i=1}^{n}\frac{x_i - y_i}{x_i}$
• Average percentage error.
• Direction matters.
• Non valid result for near zero and zero actual values.

MEAN ABSOLUTE PERCENTAGE ERROR

$$MAPE = \frac{100\%}{n}\sum_{i=1}^{n}|\frac{x_i - y_i}{x_i}|$$

• Average percentage error.
• Direction doesn't matter.
• Non valid result for near zero and zero actual values.

R^2

$$R^2 = 1- \frac{\sum_{i=1}^{n}(y_i-x_i)^2}{\sum_{i=1}^{n}(y_i - x_{mean})}$$

• Baseline model is the average of the actual values. The worse model predicts the average of the actual values, resulting in 0 $R^2$.
• Conveniently scaled between 0 and 1. (Can go beyond 0 to negative infinity)

$$R^2_{adjusted} = 1 - [\frac{(1-R^2)(n-1)}{n-k-1}]$$
• Includes $k$, the number of features/predictors in the equation.