Image for post
Image for post
Source: Unsplash

What is a Distributed System?

Have you ever wondered how Facebook stores and serves thousands of petabytes of user content such as photos, videos, likes, etc.? Have you ever gotten intrigued by how you can upload your photo or video on Instagram, and your followers can view the same instantly on their feeds? The answer to these questions is distributed systems!

So, now you are wondering what exactly is a distributed system? You must have heard these two words many times in MOOCs videos or read in a few articles or books. But you don’t know what these two words mean.

Wikipedia defines distributed systems…


Cluster is a group of objects which have similar properties and belong to the same class.

What is Clustering?

Clustering is an unsupervised learning technique which is used to make clusters of objects i.e. it is a technique to group objects of similar kind in a group. In clustering, we first partition the set of data into groups based on the similarity and then assign the labels to those groups. Also, it helps us to find out various useful features that can help in distinguishing between different groups.

Types of Clustering

Most common categories of clustering are:-

  • Partitioning Method
  • Hierarchical Method
  • Density-based Method
  • Grid-based Method
  • Model-based Method

Partitioning Method

Partitioning method classifies the group of n objects into groups based on the features and…


Hadoop? The heartbeat of big data? Yeah, you read it right.

Hadoop is an open source framework written in Java which is generally used to process and store big data in a distributed environment using easy and simple programming models.

Before diving into hadoop, let’s discuss what’s big data exactly.

What is Big Data?

Big data is a collection of large datasets that cannot be processed using traditional computing techniques. But what’s the reason of such large data? …


Today I’ll be discussing what the central limit theorem (or CLT) is and why is it important for every data science enthusiast to know.

Formal Definition

The central limit theorem states that for a given dataset with unknown distribution, the sample means will approximate the normal distribution.

In other words, the theorem states that as the size of the sample increases, the distribution of the mean across multiple samples will approximate a Gaussian distribution. But for this theorem to hold true, these samples should be sufficient in size. The distribution of sample means, calculated from repeated sampling, will tend to normality with the increase in size of these samples.

Let’s go for basics first

To understand this theorem more clearly, let’s cover the basics first. …


Neural networks are the gist of deep learning. They are multi-layer networks of neurons that we use to classify things, make predictions, etc. There are 3 parts in any neural network:

  1. input layer of our model
  2. hidden layers of neurons
  3. output layer of model

The arrows that connect the dots shows how all the neurons are interconnected and how data travels from the input layer all the way through to the output layer.

Learning Process

What is Learning process?

Every neuron in a layer takes the inputs, multiples it by some weights, adds a bias, applies an activation function and passes it on to the next…


Deciding what cross validation and performance measures should be used while using a particular machine learning technique is very important. After training our model on the dataset, we can’t say for sure that the model will perform well on the data which it hasn’t seen before. The process of deciding whether the numerical results quantifying hypothesised relationships between variables, are acceptable as descriptions of the data, is known as validation. Based on the performance on unseen data, we can say whether model is overfitted, underfitted or well generalized.

Cross Validation

Cross validation is a technique which is used to evaluate the machine…


Kubernetes is an open source platform designed with Google’s experience in container orchestration and best ideas from the community. The users expect the web applications to be available 24/7, and the developers expect to deploy new versions of these web applications several times in a day. Containerization serve these goals by enabling applications to be released and updated in an easy and fast way without downtime. Kubernetes make sure these containerized applications run where and when required, and helps them find the resources and tools they need.

Create a cluster

Kubernetes coordinates a highly available cluster of computers that are connected to work…


Redis is an open source, key-value pair data store which holds its database entirely in the memory. It uses disk only for the persistence. Redis supports relatively large number of data types when compared to many other key-value data stores. It can be used as a database, cache and message broker.

Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

In order to achieve its outstanding performance, Redis works with an in-memory dataset. Depending on the use case, data can be…


What is Neural Network?

Neural Networks are set of algorithms which closely resemble the human brain and are designed to recognize patterns. They interpret sensory data through a machine perception, labelling or clustering raw input. They can recognize numerical patterns, contained in vectors, into which all real-world data ( images, sound, text or time series), must be translated. Artificial neural networks are composed of a large number of highly interconnected processing elements (neuron) working together to solve a problem

An ANN usually involves a large number of processors operating in parallel and arranged in tiers. The first tier receives the raw input information —…


RabbitMQ is a message broker software that implemenets Advanced Message Querying Protocol (AMQP). AMQP is an application layer protocol for message-oriented middleware.

RabbitMQ can be used for performing background operations and performing asynchronous operations. It is also a way to exchange the data between different platform applications.

A message can include any kind of information. The queue-manager software stores the messages until a receiving application connects and takes a message off the queue. The receiving application then processes the message in an appropriate manner.

It can be both micro-services and an app. RabbitMQ support multiple protocols, here are the protocols…

Aditi Mittal

Machine Learning Enthusiast | Software Developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store