Writing Python Code for Neural Networks from Scratch

  1. input layer of our model
  2. hidden layers of neurons
  3. output layer of model

Learning Process

What is Learning process?

  1. Feed Forward: The inputs are given into the network which pass through the hidden layers and reach the output layer to produce an output. This process is called feedforward. It is used in training and also used to make predictions after the network has finished training.
  2. Cost Calculation: The outputs produced after feedforward are compared to the desired output and we calculate how different it is from the original value. Cost shows how different the calculated outputs are from the original value. In an ideal scenario, we would want the cost to be 0, or very close to 0.
  3. Backpropagation: We go in backward direction in our network, and update the values of weights and biases in each layer. The cost shows how much to update the weights and biases by using gradient descent. The weights and biases are updated in such a way that the outputs from our network become closer to the desired output, and the cost drops to 0 or very close to 0.
  4. Repeat the steps for a fixed number of iterations called epochs. The number of epochs is decided by looking at the cost. When we reach a stage where the cost is close to 0, and network is making accurate predictions, we can say that our network has learned.

Creating a layer Class

  • The number of inputs to this layer
  • The number of neurons in this layer
  • The activation function to use
import numpy as np # Activation Functions
def tanh(x):
return np.tanh(x)
def d_tanh(x):
return 1 — np.square(np.tanh(x))
def sigmoid(x):
return 1/(1 + np.exp(-x))
def d_sigmoid(x):
return (1 — sigmoid(x)) * sigmoid(x)
# Loss Functions
def logloss(y, a):
return -(y*np.log(a) + (1-y)*np.log(1-a))
def d_logloss(y, a):
return (a — y)/(a*(1 — a))
# The layer class
class Layer:
activationFns = {
‘tanh’: (tanh, d_tanh),
‘sigmoid’: (sigmoid, d_sigmoid)
learning_rate = 0.1

def __init__(self, inputs, neurons, activation):
self.W = np.random.randn(neurons, inputs)
self.b = np.zeros((neurons, 1))
self.act, self.d_act=self.activationFunctions.get(activation)

Feed forward Function


                         Z = W.A1 + b
A = activation(Z)
  • A1 term is the output from the previous layer. For our input layer, this will be equal to our input value X.
  • The operation between W and A1 is a dot operation. Since both are matrices it is important that their shapes match up (the number of columns in W should be equal to the number of rows in A1).
  • The output of this layer is A1. In case of the output layer, this will be equal to the predicted output, Y_bar.
# The layer class
class Layer:

#same code as above

def feedforward(self, A1):
self.A1 = A1
self.Z = np.dot(self.W, self.A1) + self.b
self.A = self.act(self.Z)
return self.A

Gradient Descent


# The layer class
class Layer:

#same as above
def backprop(self, dA):
dZ = np.multiply(self.d_act(self.Z), dA)
dW = 1/dZ.shape[1] * np.dot(dZ, self.A1.T)
db = 1/dZ.shape[1] * np.sum(dZ, axis=1, keepdims=True)
dA1 = np.dot(self.W.T, dZ)
self.W = self.W — self.learning_rate * dW
self.b = self.b — self.learning_rate * db

return dA1

Training the Network


x = np.array([[0, 0, 1, 1], [0, 1, 0, 1]]) # dim x m
y = np.array([[0, 1, 1, 0]]) # 1 x m
m = 4
epochs = 1500
lyr = [Layer(2, 3, 'tanh'), Layer(3, 1, 'sigmoid')]
costs = []
# to plot graph
for epoch in range(epochs):
# Feedforward
A = x
for layer in lyr:
A = layer.feedforward(A)
# Calulate cost to plot graph
cost = 1/m * np.sum(logloss(y, A))
dA = d_logloss(y, A)
for layer in reversed(lyr):
dA = layer.backprop(dA)
# Making predictions
A = x
for layer in lyr:
A = layer.feedforward(A)
  • m is the number of samples
  • epochs are the number of iterations
  • The first layer contains 2 inputs and 3 neurons. The two inputs are the two binary values we are performing the XOR operation on. The second layer consists of 3 inputs because the previous layer has 3 outputs from 3 neurons. Also, it consists of a single output, the answer of XOR.
  • For backpropagation, we iterate through the layers backwards. The value of dA is calculated and passed on to the next layer.
  • After the loop runs for all the epochs, our network should be trained, ie, all the weights and biases should be tuned.




Machine Learning Enthusiast | Software Developer

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Investing in AI: Why Venture Capital Firms are Taking the Leap

Pushing AI to the Edge (Part Two): Edge AI in Practice and What’s Next

Leveraging AI for Customer Experience

The Week! issue 6

Escalation from chatbots to humans in Intelichat

Ethical Data-Driven Marketing in Healthcare

Why Do Smart People Say Dumb Things About AI?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aditi Mittal

Aditi Mittal

Machine Learning Enthusiast | Software Developer

More from Medium

Libraries used in machine learning

Data and Tasks jar for Sequence Labeling — Recurrent Neural Networks(RNNs)

What’s a neural network in deep learning?

Using LIME to explore Brain Tumor MRI Classification Model