Introduction to U-Net and Res-Net for Image Segmentation

Human beings are able to see the images by capturing the reflected light rays which is a very complex task. So how can machines be programmed to perform a similar task? Computer sees the images as matrices which need to be processed to get a meaning out of it.

Image segmentation is the method to partition the image into various segments with each segment having a different entity. Convolutional Neural Networks are successful for simpler images but haven’t given good results for complex images. This is where other algorithms like U-Net and Res-Net come into play.

Background — Convolutional Neural Network (CNN)

U-Net Architecture

U-Net consists of Convolution Operation, Max Pooling, ReLU Activation, Concatenation and Up Sampling Layers and three sections: contraction, bottleneck, and expansion section. the contractions section has 4 contraction blocks. Every contraction block gets an input, applies two 3X3 convolution ReLu layers and then a 2X2 max pooling. The number of feature maps gets double at each pooling layer. The bottleneck layer uses two 3X3 Conv layers and 2X2 up convolution layer. The expansion section consists of several expansion blocks with each block passing the input to two 3X3 Conv layers and a 2X2 upsampling layer that halves the number of feature channels. It also includes a concatenation with the correspondingly cropped feature map from the contracting path. In the end, 1X1 Conv layer is used to make the number of feature maps as same as the number of segments which are desired in the output. U-net uses a loss function for each pixel of the image. This helps in easy identification of individual cells within the segmentation map. Softmax is applied to each pixel followed by a loss function. This converts the segmentation problem into a classification problem where we need to classify each pixel to one of the classes.

Residual Networks (Res-Net)

Res-Net Architecture

Components of a network include 3X3 filters, CNN down-sampling layers with stride 2, global average pooling layer and a 1000-way fully-connected layer with softmax in the end.
ResNet uses a skip connection in which an original input is also added to the output of the convolution block. This helps in solving the problem of vanishing gradient by allowing an alternative path for the gradient to flow through. Also, they use identity function which helps higher layer to perform as good as a lower layer, and not worse.

Residual Block

In traditional neural networks, each layer feeds into the next layer. But in a network with residual blocks, each layer feeds into the next layer and directly into the layers about some hops away.

Consider a neural network block, whose input is x and we would like to learn the true distribution H(x). The residual between the output and input can be denoted as:

R(x) = Output - Input = H(x) - x

The layers in a traditional network learn the true output (H(x)) whereas the layers in a residual network learn the residual (R(x)).

Hopefully, this article was a useful introduction to Res-Nets and U-Nets. Thanks for reading! Also, add any other points or concepts that I should have added below in the comments!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store