Earlier, the datasets of labelled images were relatively small and they were able to solve simple recognition tasks well because of their size. But in real life, objects exhibit a large number of properties. So, to train the model for recognition, it is important to use larger training sets with a large number of attributes. The new larger datasets such as LabelMe consists of hundreds of thousands of fully-segmented images, and ImageNet consists of around 15 million labelled high-resolution images in over 22,000 categories. …

Even after the development of more complex and powerful ML algorithms in recent times, linear regression is still hard to beat because of its versatility, robustness, and simplicity.

Linear regression was originally developed in the field of statistics and is studied as a model for understanding the relationship between input and output numerical variables, but has been borrowed by ML. Now, it is classified both as a statistical algorithm and a machine learning algorithm.

When there is a single input variable, the model is called **simple linear regression **and when there are multiple input variables, the method is known as **multiple linear regression**. …

*Verbal 158, Quant 169 :)*

Graduate Research Examination (GRE) is a standardized examination provided by the Educational Testing Service(ETS). It is a computer-based test which you can** take any time and any number of times**. This test is an admission requirement for many Graduate schools in the United States and Canada. It measures verbal reasoning, quantitative reasoning, analytical writing, and critical thinking skills of an individual. The cost to take the test is US $205, but you might promotional discount if you book it along with TOEFL. The official scores are valid for five years from the test date.

GRE has six sections divided into three categories: Analytical Writing(AWA), Verbal Reasoning and Quantitative Ability. The writing section is of 6 marks. The Verbal Reasoning and Quantitative ability are 170 marks each. Out of these sections, one section is always an experimental section, but it will be impossible to judge which one is that. Say, if you get two Quant sections and three verbal reasoning sections, one of the verbal section is experimental and is not taken into consideration while calculating the score. Also, it is adaptive, that is if you solve the first section correctly you are going to get more difficult questions ahead in next sections. …

Clustering is an unsupervised learning technique which is used to make clusters of objects i.e. it is a technique to group objects of similar kind in a group. In clustering, we first partition the set of data into groups based on the similarity and then assign the labels to those groups. Also, it helps us to find out various useful features that can help in distinguishing between different groups.

Most common categories of clustering are:-

- Partitioning Method
- Hierarchical Method
- Density-based Method
- Grid-based Method
- Model-based Method

Partitioning method classifies the group of n objects into groups based on the features and similarity of data. …

Generative Adversarial Network (or GAN for short) is a class of neural networks that are used in unsupervised learning. They are generative models, that is they create new data instances that resemble the original training data.

GAN works by pairing a generator (which learns to produce the target output) with a discriminator (which learns to distinguish true data from the output of the generator). The generator tries to fool the discriminator, and the discriminator tries to keep from being fooled.

So the question is: why is GAN required? It is observed that neural networks can be easily fooled into misclassifying things by adding a small amount of noise into the original data. The model after adding noise has higher confidence in the wrong prediction than when it predicted correctly. The reason is that most machine learning models learn from a limited amount of data, which is a huge drawback, as it is prone to overfitting. …

Learning can be defined as the acquisition of knowledge or skills through experience, study, or by being taught. **Machine learning** is a phenomenon where a machine can be taught or some machines can learn even on its own.

Wikipedia has defined **deep learning **as:

`Deep learning is a class of machine learning algorithms that uses multiple layers to progressively extract higher level features from the raw input. `

For example, in image processing, lower layers may identify edges, while higher layers may identify the concepts such as digits or letters or faces.

In this article, we are going to discuss few deep learning algorithms such as deep belief networks, Generative Adversarial Networks, Transformers and Graph Neural Network. …

Hadoop? The heartbeat of big data? Yeah, you read it right.

Hadoop is an open source framework written in Java which is generally used to process and store big data in a distributed environment using easy and simple programming models.

Before diving into hadoop, let’s discuss what’s big data exactly.

**Big data** is a collection of large datasets that cannot be processed using traditional computing techniques. But **what’s the reason** of such large data? …

The central limit theorem states that for a given dataset with unknown distribution, the sample means will approximate the normal distribution.

In other words, the theorem states that as the size of the sample increases, the distribution of the mean across multiple samples will approximate a Gaussian distribution. But for this theorem to hold true, these samples should be sufficient in size. The distribution of sample means, calculated from repeated sampling, will tend to normality with the increase in size of these samples.

To understand this theorem more clearly, let’s cover the basics first. …

Neural networks are the gist of deep learning. They are multi-layer networks of neurons that we use to classify things, make predictions, etc. There are 3 parts in any neural network:

- input layer of our model
- hidden layers of neurons
- output layer of model

The arrows that connect the dots shows how all the neurons are interconnected and how data travels from the input layer all the way through to the output layer.

Every neuron in a layer takes the inputs, multiples it by some weights, adds a bias, applies an activation function and passes it on to the next layer. …

Deciding what cross validation and performance measures should be used while using a particular machine learning technique is very important. After training our model on the dataset, we can’t say for sure that the model will perform well on the data which it hasn’t seen before. The process of deciding whether the numerical results quantifying hypothesised relationships between variables, are acceptable as descriptions of the data, is known as validation. Based on the performance on unseen data, we can say whether model is overfitted, underfitted or well generalized.

Cross validation is a technique which is used to evaluate the machine learning model by training it on the subset of the available data and then evaluating them on the remaining input data. On a simple note, we keep a portion of data aside and then train the model on the remaining data. And then we test and evaluate the performance of model on portion of data that was kept aside. …

About