# Types of Clustering

• Partitioning Method
• Hierarchical Method
• Density-based Method
• Grid-based Method
• Model-based Method

## Partitioning Method

The general problem would be like that we will have ‘n’ objects and we need to construct ‘k’ partitions among the data objects where each partition represents a cluster and will contain at least one object. Also, there is an additional condition that says each object can belong to only one group.

The partitioning method starts by creating an initial random partitioning. Then it iterates to improve the partitioning by moving the objects from one partition to another.

k-Means clustering follows the partitioning approach to classify the data.

## Hierarchical Method

There are two approaches depending on how we create the hierarchy −

• Agglomerative Approach
• Divisive Approach

Agglomerative Approach

Agglomerative approach is a type of hierarchical method which uses bottom-up strategy. We start with each object considering as a separate cluster and keeps on merging the objects that are close to one another. It keep on doing so until all of the groups are merged into one or until the termination condition holds.

Divisive Approach

Divisive approach is type of hierarchical method which uses top-down strategy. We start by considering all objects in the same cluster. In the continuous iteration, a cluster is split up into smaller clusters. It is down until each object in one cluster or the termination condition holds. This method is rigid, i.e., once a merging or splitting is done, it can never be undone.

# Writing K-means clustering code in Python from scratch

In this code, I am choosing basic data set with 4 data points and 2 random features

Let’s start the code by importing the relevant libraries.

`import pandas as pdimport numpy as npimport matplotlib.pyplot as plt`

Pandas is a library written for the Python for data manipulation and analysis. It offers operations for manipulating numerical tables and time series. NumPy is another Python library which contains multi-dimensional array and matrix data structures. It can be utilised to perform a number of mathematical operations on arrays such as trigonometric, statistical, and algebraic routines. Matplotlib.pyplot is used to plot different types of visual analysis of the data such as column graph, scatter matrix etc.

`df = pd.DataFrame({         'x': [1,2,4,5],         'y': [1,1,3,4]     }) #select the number of clusters k=2#initialize the centroidscentroids = [[df.x,df.y],[df.x,df.y]]`

In the above code segment we have created a random data frame of two features and 4 data points. Then we initialize the number of clusters we need. It is either less than or equal to the number of data points in the data set. Further, we initialize the centroids. In this case, we have taken (1,1) and (2,1) as the centroids of two different clusters.

`def assignment(df, centroids):       df['distance_from_1'] = (np.sqrt((df['x'] - centroids) ** 2 + (df['y'] - centroids) ** 2))       df['distance_from_2'] = (np.sqrt((df['x'] - centroids) ** 2 + (df['y'] - centroids) ** 2))       centroid_distance_cols = ['distance_from_{}'.format(i) for i in (1,2)]       df['closest'] = df.loc[:, centroid_distance_cols].idxmin(axis=1)                        df['closest'] = df['closest'].map(lambda x: int(x.lstrip('distance_from_')))       df['color'] = df['closest'].map(lambda x: colmap[x])       return df#assign different color to the cluster pointscolmap = {1: 'r', 2: 'g'}#assign the cluster to data points based on the #distance from centroids of both clustersdf = assignment(df, centroids)print(df.head())`

In the above code snippet, we calculate the distance of each point from the centroid and then assign a color to that data point according to the cluster with which it is more closer to. Now we would update the centroid by considering the newly added point in the clusters

`def update(centroids):       for i in (0,k-1):              centroids[i] = np.mean(df[df['closest'] == (i+1)]['x'])             centroids[i] = np.mean(df[df['closest'] == (i+1)]['y'])          return centroidsdef deepcopy(x):       w, h = 2, 2;       new_centroids = [[0 for a in range(w)] for y in range(h)]        for i in (0,1):              new_centroids.append(x[i])          return xdef equals(x,y):       for i in (0,k-1):              if(x[i]!=y[i]):                     return 0       return 1#copy the centroidsold_centroids = deepcopy(centroids)#find the new centroids of the clusterscentroids = update(centroids)  df = assignment(df, centroids)#keep on updating the centroid unless we get constant resultwhile True:       closest_centroids = deepcopy(df['closest'])       centroids = update(centroids)       df = assignment(df, centroids)       if equals(df['closest'],closest_centroids)==1:              break fig = plt.figure(figsize=(5, 5))plt.scatter(df['x'], df['y'], color=df['color'], alpha=0.5, edgecolor='k')for i in (0,k-1):        plt.scatter(*centroids[i], color=colmap[i+1]) print(df)print(centroids)plt.xlim(0, 7)plt.ylim(0, 5)plt.show()`

Output:

# Wrapping up

Also, clusters in k-means clustering method are assumed to be spherical and evenly sized, which may reduce the accuracy in the results.

Machine Learning Enthusiast | Software Developer

## More from Aditi Mittal

Machine Learning Enthusiast | Software Developer