Writing Python Code for Neural Networks from Scratch

  1. hidden layers of neurons
  2. output layer of model

Learning Process

What is Learning process?

Every neuron in a layer takes the inputs, multiples it by some weights, adds a bias, applies an activation function and passes it on to the next layer. The learning process can be described in following steps:

  1. Cost Calculation: The outputs produced after feedforward are compared to the desired output and we calculate how different it is from the original value. Cost shows how different the calculated outputs are from the original value. In an ideal scenario, we would want the cost to be 0, or very close to 0.
  2. Backpropagation: We go in backward direction in our network, and update the values of weights and biases in each layer. The cost shows how much to update the weights and biases by using gradient descent. The weights and biases are updated in such a way that the outputs from our network become closer to the desired output, and the cost drops to 0 or very close to 0.
  3. Repeat the steps for a fixed number of iterations called epochs. The number of epochs is decided by looking at the cost. When we reach a stage where the cost is close to 0, and network is making accurate predictions, we can say that our network has learned.

Creating a layer Class

In this section, we will be creating a Layer class to represent each layer in our network. We have defined activation functions and loss function for calculating the cost that we will be using in our network. We have also defined a learning rate. We take three parameters as input:

  • The number of neurons in this layer
  • The activation function to use
import numpy as np # Activation Functions
def tanh(x):
return np.tanh(x)
def d_tanh(x):
return 1 — np.square(np.tanh(x))
def sigmoid(x):
return 1/(1 + np.exp(-x))
def d_sigmoid(x):
return (1 — sigmoid(x)) * sigmoid(x)
# Loss Functions
def logloss(y, a):
return -(y*np.log(a) + (1-y)*np.log(1-a))
def d_logloss(y, a):
return (a — y)/(a*(1 — a))
# The layer class
class Layer:
activationFns = {
‘tanh’: (tanh, d_tanh),
‘sigmoid’: (sigmoid, d_sigmoid)
learning_rate = 0.1

def __init__(self, inputs, neurons, activation):
self.W = np.random.randn(neurons, inputs)
self.b = np.zeros((neurons, 1))
self.act, self.d_act=self.activationFunctions.get(activation)

Feed forward Function


The feed forward function propagates the inputs through each layer of the network until it reaches the output layer and produces some output. The feed forward equations can be written as:

                         Z = W.A1 + b
A = activation(Z)
  • The operation between W and A1 is a dot operation. Since both are matrices it is important that their shapes match up (the number of columns in W should be equal to the number of rows in A1).
  • The output of this layer is A1. In case of the output layer, this will be equal to the predicted output, Y_bar.
# The layer class
class Layer:

#same code as above

def feedforward(self, A1):
self.A1 = A1
self.Z = np.dot(self.W, self.A1) + self.b
self.A = self.act(self.Z)
return self.A

Gradient Descent

Gradient descent calculates by how much our weights and biases should be updated, so that our cost reaches almost 0. This is done using partial derivatives. It is based on the fact that, at the minimum value of a function, its partial derivative will be equal to zero. So for each layer, we find the derivative of cost with respect to weights and biases for that layer. This derivative value is the update that we make to our current values of weights and biases.


The backpropagation algorithm is going backward in our network, calculating the value of partial derivative values in each layer and applying the learning equation. The cost can be calculated as :

# The layer class
class Layer:

#same as above
def backprop(self, dA):
dZ = np.multiply(self.d_act(self.Z), dA)
dW = 1/dZ.shape[1] * np.dot(dZ, self.A1.T)
db = 1/dZ.shape[1] * np.sum(dZ, axis=1, keepdims=True)
dA1 = np.dot(self.W.T, dZ)
self.W = self.W — self.learning_rate * dW
self.b = self.b — self.learning_rate * db

return dA1

Training the Network


x = np.array([[0, 0, 1, 1], [0, 1, 0, 1]]) # dim x m
y = np.array([[0, 1, 1, 0]]) # 1 x m
m = 4
epochs = 1500
lyr = [Layer(2, 3, 'tanh'), Layer(3, 1, 'sigmoid')]
costs = []
# to plot graph
for epoch in range(epochs):
# Feedforward
A = x
for layer in lyr:
A = layer.feedforward(A)
# Calulate cost to plot graph
cost = 1/m * np.sum(logloss(y, A))
dA = d_logloss(y, A)
for layer in reversed(lyr):
dA = layer.backprop(dA)
# Making predictions
A = x
for layer in lyr:
A = layer.feedforward(A)
  • epochs are the number of iterations
  • The first layer contains 2 inputs and 3 neurons. The two inputs are the two binary values we are performing the XOR operation on. The second layer consists of 3 inputs because the previous layer has 3 outputs from 3 neurons. Also, it consists of a single output, the answer of XOR.
  • For backpropagation, we iterate through the layers backwards. The value of dA is calculated and passed on to the next layer.
  • After the loop runs for all the epochs, our network should be trained, ie, all the weights and biases should be tuned.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aditi Mittal

Aditi Mittal

Machine Learning Enthusiast | Software Developer