Writing Python Code for Neural Networks from Scratch

5 min readApr 24, 2020

Neural networks are the gist of deep learning. They are multi-layer networks of neurons that we use to classify things, make predictions, etc. There are 3 parts in any neural network:

input layer of our model
hidden layers of neurons
output layer of model

The arrows that connect the dots shows how all the neurons are interconnected and how data travels from the input layer all the way through to the output layer.

Learning Process

What is Learning process?

Every neuron in a layer takes the inputs, multiples it by some weights, adds a bias, applies an activation function and passes it on to the next layer. The learning process can be described in following steps:

Feed Forward: The inputs are given into the network which pass through the hidden layers and reach the output layer to produce an output. This process is called feedforward. It is used in training and also used to make predictions after the network has finished training.
Cost Calculation: The outputs produced after feedforward are compared to the desired output and we calculate how different it is from the original value. Cost shows how different the calculated outputs are from the original value. In an ideal scenario, we would want the cost to be 0, or very close to 0.
Backpropagation: We go in backward direction in our network, and update the values of weights and biases in each layer. The cost shows how much to update the weights and biases by using gradient descent. The weights and biases are updated in such a way that the outputs from our network become closer to the desired output, and the cost drops to 0 or very close to 0.
Repeat the steps for a fixed number of iterations called epochs. The number of epochs is decided by looking at the cost. When we reach a stage where the cost is close to 0, and network is making accurate predictions, we can say that our network has learned.

Creating a layer Class

In this section, we will be creating a Layer class to represent each layer in our network. We have defined activation functions and loss function for calculating the cost that we will be using in our network. We have also defined a learning rate. We take three parameters as input:

The number of inputs to this layer
The number of neurons in this layer
The activation function to use

Our weights is a matrix whose number of rows is equal to the number of neurons in the layer, and number of columns is equal to the number of inputs to this layer.

The np.random.randn function is used to create a matrix of shape (neurons, input) with random values. It is important to initialise the weight matrix with random values for our network to learn properly. Our bias is a column vector, and contains a bias value for each neuron in the network. It is initialised to 0 using the np.zeros function.

import numpy as np # Activation Functions
def tanh(x): 
   return np.tanh(x) 
def d_tanh(x): 
   return 1 — np.square(np.tanh(x)) 
def sigmoid(x): 
   return 1/(1 + np.exp(-x)) 
def d_sigmoid(x): 
   return (1 — sigmoid(x)) * sigmoid(x) # Loss Functions 
def logloss(y, a): 
   return -(y*np.log(a) + (1-y)*np.log(1-a)) 
def d_logloss(y, a): 
   return (a — y)/(a*(1 — a)) # The layer class
class Layer:    activationFns = { 
           ‘tanh’: (tanh, d_tanh), 
           ‘sigmoid’: (sigmoid, d_sigmoid) 
        } 
   learning_rate = 0.1 
   
   def __init__(self, inputs, neurons, activation): 
        self.W = np.random.randn(neurons, inputs) 
        self.b = np.zeros((neurons, 1)) 
        self.act, self.d_act=self.activationFunctions.get(activation)

Feed forward Function

Purpose

The feed forward function propagates the inputs through each layer of the network until it reaches the output layer and produces some output. The feed forward equations can be written as:

                         Z = W.A1 + b
                         A = activation(Z)

A1 term is the output from the previous layer. For our input layer, this will be equal to our input value X.
The operation between W and A1 is a dot operation. Since both are matrices it is important that their shapes match up (the number of columns in W should be equal to the number of rows in A1).
The output of this layer is A1. In case of the output layer, this will be equal to the predicted output, Y_bar.

We are saving the values of A1, Z and A in our class to use them later during backpropagation.

# The layer class
class Layer: 
     
     #same code as above
     ....
     
     def feedforward(self, A1): 
            self.A1 = A1 
            self.Z = np.dot(self.W, self.A1) + self.b 
            self.A = self.act(self.Z) 
            return self.A

Gradient Descent

Gradient descent calculates by how much our weights and biases should be updated, so that our cost reaches almost 0. This is done using partial derivatives. It is based on the fact that, at the minimum value of a function, its partial derivative will be equal to zero. So for each layer, we find the derivative of cost with respect to weights and biases for that layer. This derivative value is the update that we make to our current values of weights and biases.

Here alpha is the learning rate

Backpropagation

The backpropagation algorithm is going backward in our network, calculating the value of partial derivative values in each layer and applying the learning equation. The cost can be calculated as :

Here, m is the number of samples in our training set. L is any loss function that calculates the error between the actual value and predicted value.

# The layer class
class Layer: 
       
     #same as above
     ...
     def backprop(self, dA): 
         dZ = np.multiply(self.d_act(self.Z), dA) 
         dW = 1/dZ.shape[1] * np.dot(dZ, self.A1.T) 
         db = 1/dZ.shape[1] * np.sum(dZ, axis=1, keepdims=True)     
         dA1 = np.dot(self.W.T, dZ)          self.W = self.W — self.learning_rate * dW 
         self.b = self.b — self.learning_rate * db 
         
         return dA1

Training the Network

Code

x = np.array([[0, 0, 1, 1], [0, 1, 0, 1]]) # dim x m
y = np.array([[0, 1, 1, 0]]) # 1 x m m = 4
epochs = 1500 lyr = [Layer(2, 3, 'tanh'), Layer(3, 1, 'sigmoid')]
costs = [] 
# to plot graph  
for epoch in range(epochs):    
     # Feedforward    
     A = x    
     for layer in lyr:        
         A = layer.feedforward(A)     
     # Calulate cost to plot graph    
     cost = 1/m * np.sum(logloss(y, A))    
     costs.append(cost)      
     dA = d_logloss(y, A)    
     for layer in reversed(lyr):        
         dA = layer.backprop(dA)  # Making predictions
A = x
for layer in lyr:    
     A = layer.feedforward(A)
print(A)

m is the number of samples
epochs are the number of iterations
The first layer contains 2 inputs and 3 neurons. The two inputs are the two binary values we are performing the XOR operation on. The second layer consists of 3 inputs because the previous layer has 3 outputs from 3 neurons. Also, it consists of a single output, the answer of XOR.
For backpropagation, we iterate through the layers backwards. The value of dA is calculated and passed on to the next layer.
After the loop runs for all the epochs, our network should be trained, ie, all the weights and biases should be tuned.

Thanks for reading!