Cross Validation and Performance Measures in Machine Learning

Cross Validation

Types of Cross Validation Techniques

  1. Holdout Method: The holdout method is the simple type of cross validation where the data set is divided into two sets, called the training set and the testing set. The model is fitted and trained using the training set only. Then the model is asked to predict the output values for the data in the testing set and it has never seen this data before. The model is evaluated using the appropriate performance measure such as mean absolute test set error. Advantage — It is preferable to the residual method and takes less time to compute. However, its evaluation can have a high variance. The evaluation depends entirely on which data points are in the training set and the test set, and thus the evaluation will be different depending on the division made.

Performance Measures

Classification Accuracy

Logarithmic Loss

Confusion Matrix

  • True Positives : The cases in which we predicted YES and the actual output was also YES.
  • True Negatives : The cases in which we predicted NO and the actual output was NO.
  • False Positives : The cases in which we predicted YES and the actual output was NO.
  • False Negatives : The cases in which we predicted NO and the actual output was YES.

Area Under Curve

  • True Positive Rate (Sensitivity) : True Positive Rate is calculated by TP/ (FN+TP). True Positive Rate is the proportion of positive data points that are correctly considered as positive, with respect to all positive data points. It has values in the range [0, 1].
  • False Positive Rate (Specificity) : False Positive Rate is calculate by FP / (FP+TN) which means that it is the proportion of negative data points that are mistakenly considered as positive, with respect to all negative data points. It has values in the range [0, 1].

F1 Score

  • Precision : It is the number of correct positive results divided by the number of positive results predicted by the classifier.
  • Recall : It is the number of correct positive results divided by the number of all samples that should have been identified as positive.

Mean Absolute Error

Mean Squared Error

--

--

--

Machine Learning Enthusiast | Software Developer

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Understanding Neural Networks (And How They Work)

Neural Networks as universal function approximators

Machine learning Attack

machine learning model attack — adversarial

POINTOUT: Build Datasets for Object Detection on Satellite Imagery.

Classification on Organic Compounds

Machine Learning and Microsoft Dynamics 365

Stock Prediction with Deep Learning

Blog Post #4: Strawman Time.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aditi Mittal

Aditi Mittal

Machine Learning Enthusiast | Software Developer

More from Medium

Tuple functions in python

AlphaFold: A simple explainer — Part 2

Libraries used in machine learning

Clustering by kmean?