Cross Validation and Performance Measures in Machine Learning

Deciding what cross validation and performance measures should be used while using a particular machine learning technique is very important. After training our model on the dataset, we can’t say for sure that the model will perform well on the data which it hasn’t seen before. The process of deciding whether the numerical results quantifying hypothesised relationships between variables, are acceptable as descriptions of the data, is known as validation. Based on the performance on unseen data, we can say whether model is overfitted, underfitted or well generalized.

Cross Validation

Types of Cross Validation Techniques

2. K-Fold Cross Validation Method: It is a modification in the holdout method. The dataset is divided into k subsets and the value of k shouldn’t be too small or too large, ideally we choose 5 to 10 depending on the data size. The higher value of k leads to less biased model whereas the lower value of K is similar to the holdout approach. Then we train the model using the k-1 folds and validate and test the model on the remaining kth fold. Note down the errors. This process is repeated until every K-fold serve as the test set. Then the average of the recorded scores is taken which is the performance metric for the model.

Advantage — It doesn’t matter how the data gets divided. Every data point gets to be in a test set exactly once, and gets to be in a training set k-1 times. The variance of the resulting estimate is reduced as k is increased.

Disadvantage — The training algorithm has to be rerun from scratch k times, which means it takes k times as much computation to make an evaluation.

3. Leave-one-out cross validation is K-fold cross validation taken to its logical extreme, with K equal to N, the number of data points in the set. That means that N separate times, the model is trained on all the data except for one point and a prediction is made for that point. As before, the average error is computed and used to evaluate the model. The evaluation given by leave-one-out cross validation error (LOO-XVE) is good, but at first pass it seems very expensive to compute.

Performance Measures

Classification Accuracy

It works well only if there are equal number of samples belonging to each class. For example, if there are 95% samples of class A and 5% samples of class B in our training set. Then the model can easily get 95% training accuracy by simply predicting every training sample belonging to class A. When the same model is tested on a test set with 55% samples of class A and 45% samples of class B, then the test accuracy would drop down to 55%.

Logarithmic Loss

where y_ij indicates whether sample i belongs to class j or not and p_ij indicates the probability of sample i belonging to class j

Log Loss has no upper bound and it exists on the range [0, ∞). Log Loss nearer to 0 indicates higher accuracy, whereas if the Log Loss is away from 0 then it indicates lower accuracy.

Confusion Matrix

There are 4 important terms :

  • True Positives : The cases in which we predicted YES and the actual output was also YES.
  • True Negatives : The cases in which we predicted NO and the actual output was NO.
  • False Positives : The cases in which we predicted YES and the actual output was NO.
  • False Negatives : The cases in which we predicted NO and the actual output was YES.

Accuracy for the matrix can be calculated by taking average of the values lying across the main diagonal i.e

Area Under Curve

  • True Positive Rate (Sensitivity) : True Positive Rate is calculated by TP/ (FN+TP). True Positive Rate is the proportion of positive data points that are correctly considered as positive, with respect to all positive data points. It has values in the range [0, 1].
  • False Positive Rate (Specificity) : False Positive Rate is calculate by FP / (FP+TN) which means that it is the proportion of negative data points that are mistakenly considered as positive, with respect to all negative data points. It has values in the range [0, 1].

AUC is the area under the curve of plot False Positive Rate vs True Positive Rate at different points in [0, 1].

AUC also has a range of [0, 1] and greater the value, the better is the performance of our model.

F1 Score

  • Precision : It is the number of correct positive results divided by the number of positive results predicted by the classifier.
  • Recall : It is the number of correct positive results divided by the number of all samples that should have been identified as positive.

Mean Absolute Error

Mean Squared Error

Advantage — It is easier to compute the gradient, whereas MAE needs complicated linear programming tools to compute the gradient.

Thank you for reading and looking forward to your feedback!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store