In #6, I have talked about building a Minimum Viable Product (MVP). In this post, I am going to discuss the different matrices to evaluate a regression model and my choice for the problem.

NOTATIONS

  • n is the number of testing samples.
  • y_i is the i th predicted value.
  • x_i is the i th actual value.
  • k is the number of features/predictors.

MEAN ABSOLUTE ERROR

\[MAE = \frac{1}{n}\sum_{i=1}^{n}|y_i – x_i|\]
  • Direction of error is not considered. (The absolute sign)
  • All individual differences have equal weight. (1/n)

MEAN SQUARED ERROR

\[MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i – x_i)^2\]
  • Direction of error is not considered. (Squared)
  • Gives a higher weight to large errors. (Differences are squared)
  • Measured in units that are the square of the target variable.

ROOT MEAN SQUARED ERROR

\[RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i – x_i)^2}\]
  • Direction of error is not considered. (Squared)
  • Gives a higher weight to large errors. (Differences are squared)
  • Measured in the same units as the target variable.

ROOT MEAN SQUARED LOGARITHMIC ERROR

\[ RMSLE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(log(1+y_i) – log(1+x_i))^2}\]
  • RMSLE is effectively RMSE on the log-transformed predicted and target values.
  • Penalizes the underestimation of the actual value more severely than it does for the overestimation. (The shape of logarithm curve)
  • Only percentual differences matter. log(1+y_i) - log(1+x_i) = log(\frac{1+y_1}{1+x_1})

MEAN PERCENTAGE ERROR

\[MPE = \frac{100\%}{n}\sum_{i=1}^{n}\frac{x_i – y_i}{x_i}\]
  • Average percentage error.
  • Direction matters.
  • Non valid result for near zero and zero actual values.

MEAN ABSOLUTE PERCENTAGE ERROR

$$MAPE = \frac{100\%}{n}\sum_{i=1}^{n}|\frac{x_i – y_i}{x_i}|$$

  • Average percentage error.
  • Direction doesn’t matter.
  • Non valid result for near zero and zero actual values.

R^2

$$R^2 = 1- \frac{\sum_{i=1}^{n}(y_i-x_i)^2}{\sum_{i=1}^{n}(y_i – x_{mean})}$$

  • Baseline model is the average of the actual values. The worse model predicts the average of the actual values, resulting in 0 R^2.
  • Conveniently scaled between 0 and 1. (Can go beyond 0 to negative infinity)

Adjusted R^2

$$R^2_{adjusted} = 1 – [\frac{(1-R^2)(n-1)}{n-k-1}]$$

  • Includes k, the number of features/predictors in the equation.
  • Decrease if you add less useful predictors and vice versa.

MY CHOICE

Our problem contains no negative value since the price cannot be negative. From the EDA (Exploratory Data Analysis), we found that the price is heavily skewed with some outliers. Therefore, I don’t want to penalize absolute errors. Instead, relative error/percentual differences matter more.

Therefore, I will choose Adjusted R2 (R2 also works but adjusted R2 takes K into consideration) as my error matrices. RMSLE can also be a viable choice.

Leave a Reply

%d bloggers like this: