Cross Validation and Performance Measures in Machine Learning

Cross Validation

Cross validation is a technique which is used to evaluate the machine learning model by training it on the subset of the available data and then evaluating them on the remaining input data. On a simple note, we keep a portion of data aside and then train the model on the remaining data. And then we test and evaluate the performance of model on portion of data that was kept aside.

Types of Cross Validation Techniques

  1. Holdout Method: The holdout method is the simple type of cross validation where the data set is divided into two sets, called the training set and the testing set. The model is fitted and trained using the training set only. Then the model is asked to predict the output values for the data in the testing set and it has never seen this data before. The model is evaluated using the appropriate performance measure such as mean absolute test set error. Advantage — It is preferable to the residual method and takes less time to compute. However, its evaluation can have a high variance. The evaluation depends entirely on which data points are in the training set and the test set, and thus the evaluation will be different depending on the division made.

Performance Measures

Classification Accuracy

It is the ratio of number of correct predictions to the total number of input samples.

Logarithmic Loss

Logarithmic Loss penalises the false classifications and it works well for multi-class classification. The classifier must assign probability to each class for all the samples. If there are N samples belonging to M classes, then the Log Loss is calculated as below :

Confusion Matrix

Confusion Matrix gives us a matrix as output and describes the complete performance of the model.

  • True Negatives : The cases in which we predicted NO and the actual output was NO.
  • False Positives : The cases in which we predicted YES and the actual output was NO.
  • False Negatives : The cases in which we predicted NO and the actual output was YES.

Area Under Curve

Area Under Curve(AUC) is one of the most widely used metrics for evaluation. It is used for binary classification problem. AUC of a classifier is equal to the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example. Before defining AUC, let us understand two basic terms :

  • False Positive Rate (Specificity) : False Positive Rate is calculate by FP / (FP+TN) which means that it is the proportion of negative data points that are mistakenly considered as positive, with respect to all negative data points. It has values in the range [0, 1].

F1 Score

F1 Score is the harmonic mean(H.M.) between precision and recall. The range is [0, 1]. It depicts how precise the classifier is i.e. how many instances it classifies correctly and that it didn’t miss a significant number of instances. The greater the F1 Score, the better is the performance of the model.

  • Recall : It is the number of correct positive results divided by the number of all samples that should have been identified as positive.

Mean Absolute Error

It is the average of the difference between the original values and the predicted values. It doesn’t gives us any idea of the direction of the error i.e. whether the model is under predicting or over predicting the data.

Mean Squared Error

Mean Squared Error(MSE) is quite similar to Mean Absolute Error with the difference that MSE takes average of the square of the difference between the original values and the predicted values.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aditi Mittal

Aditi Mittal

Machine Learning Enthusiast | Software Developer