Skip to content

Latest commit

 

History

History
190 lines (126 loc) · 10.4 KB

cnn_py.MD

File metadata and controls

190 lines (126 loc) · 10.4 KB

Computer Vision with OpenCV [opencv 4 map]

Computer Vision (CV) is a multidisciplinary field that enables computers to interpret and understand visual information from the world around them, much like human vision. It encompasses techniques and methodologies for acquiring, processing, analyzing, and interpreting images and videos. The ultimate goal of computer vision is to replicate and even enhance human visual perception capabilities using artificial intelligence and computational methods.

Key components of computer vision include:

  • Image Acquisition: The process of capturing visual data from various sources such as cameras, sensors, or satellite imagery.
  • Image Processing: Techniques for manipulating and enhancing digital images to improve their quality, reduce noise, and extract useful information.
  • Feature Extraction: Identifying and extracting relevant features or patterns from images, such as edges, corners, textures, or shapes.
  • Object Recognition and Detection: Identifying and localizing objects within images or videos, often through the use of machine learning algorithms.
  • Semantic Segmentation: Assigning semantic labels to each pixel in an image, allowing for detailed understanding of object boundaries and spatial relationships.
  • Scene Understanding: Interpreting the content and context of a scene, including the relationships between objects and their interactions.
  • Tracking and Motion Analysis: Monitoring the movement of objects or individuals over time, often crucial for applications like surveillance or human-computer interaction.
  • 3D Reconstruction: Inferring the three-dimensional structure of objects or scenes from multiple 2D images or depth sensors.

My Computer Vision theory notes are at @github/cv-theory, SLAM notes are @/visual-slam and OpenCV C++ notes are @github/cpp-opencv. Some important and popular computer vision tasks are:

Calibrating a Camera : calibrate.cpp, CameraCalibrator.cpp

Recovering camera pose: cameraPose.cpp

Reconstructing a 3D scene from calibrated cameras : estimateE.cpp, triangulate.h, triangulate.cpp

Computing depth from stereo : stereoMatcher.cpp

Extracting the Foreground Objects in Video : BGFGSegmentor.h, foreground.cpp

Tracing feature points in a video : featuretracker.h, traker.cpp

Estimating the optical flow : flow.cpp

Tracking an object in a video : visualtracker.h, oTracker.cpp

Recognizing faces using nearest neighbors of local binary patterns : recognizeFace.cpp

Finding objects and faces with a cascade of Haar features : detectObjects.cpp

Detecting objects and peoples with Support Vector Machines and histograms of oriented gradients : trainSVM.cpp

Face Recognition using Eigenfaces or Fisherfaces (code)

Android Camera Calibration and AR using the ARUco (code)

I have Android app development experience @github/applied-cs

Web Computer Vision with OpenCV.js (code)

Convolutional Neural Network (CNN) in PyTorch

Convolutional Neural Network (CNN) is the extended version of artificial neural networks (ANN) which is predominantly used to extract the feature from the grid-like matrix dataset. For example visual datasets like images or videos where data patterns play an extensive role. Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer, Pooling layer, and fully connected layers.

  • Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will be an image or a sequence of images. This layer holds the raw input of the image.
  • Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset. It applies a set of learnable filters known as the kernels to the input images. The filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and computes the dot product between kernel weight and the corresponding input image patch. The output of this layer is referred ad feature maps.
  • Activation Layer: By adding an activation function to the output of the preceding layer, activation layers add nonlinearity to the network. it will apply an element-wise activation function to the output of the convolution layer. Some common activation functions are RELU: max(0, x), Tanh, Leaky RELU, etc.
  • Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce the size of volume which makes the computation fast reduces memory and also prevents overfitting. Two common types of pooling layers are max pooling and average pooling.
  • Flattening: The resulting feature maps are flattened into a one-dimensional vector after the convolution and pooling layers so they can be passed into a completely linked layer for categorization or regression.
  • Fully Connected Layers: It takes the input from the previous layer and computes the final classification or regression task.

Semantic Segmentation : U-Net Architecture:

U-Net's distinctive architecture can be summarized as follows:

  • Contracting Path: The architecture features a contracting path that employs convolutional layers and max-pooling to progressively reduce spatial dimensions while capturing high-level features.
  • Expansive Path: The expansive path uses transposed convolutions and skip connections to recover spatial information and localize features.
  • Skip Connections: The innovative inclusion of skip connections enables the network to fuse high-resolution features from the contracting path with the expansive path, facilitating precise segmentation.

U-Net architecture for semantic segmentation with PyTorch:

import torch
import torch.nn as nn
import torchvision.transforms.functional as TF

class DoubleConv(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(DoubleConv, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 3, 1, 1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels, 3, 1, 1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
        )

    def forward(self, x):
        return self.conv(x)

class UNET(nn.Module):
    def __init__(
            self, in_channels=3, out_channels=1, features=[64, 128, 256, 512],
    ):
        super(UNET, self).__init__()
        self.ups = nn.ModuleList()
        self.downs = nn.ModuleList()
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        # Down part of UNET
        for feature in features:
            self.downs.append(DoubleConv(in_channels, feature))
            in_channels = feature

        # Up part of UNET
        for feature in reversed(features):
            self.ups.append(
                nn.ConvTranspose2d(
                    feature*2, feature, kernel_size=2, stride=2,
                )
            )
            self.ups.append(DoubleConv(feature*2, feature))

        self.bottleneck = DoubleConv(features[-1], features[-1]*2)
        self.final_conv = nn.Conv2d(features[0], out_channels, kernel_size=1)

    def forward(self, x):
        skip_connections = []

        for down in self.downs:
            x = down(x)
            skip_connections.append(x)
            x = self.pool(x)

        x = self.bottleneck(x)
        skip_connections = skip_connections[::-1]

        for idx in range(0, len(self.ups), 2):
            x = self.ups[idx](x)
            skip_connection = skip_connections[idx//2]

            if x.shape != skip_connection.shape:
                x = TF.resize(x, size=skip_connection.shape[2:])

            concat_skip = torch.cat((skip_connection, x), dim=1)
            x = self.ups[idx+1](concat_skip)

        return self.final_conv(x)

def test():
    x = torch.randn((3, 1, 161, 161))
    model = UNET(in_channels=1, out_channels=1)
    preds = model(x)
    assert preds.shape == x.shape

if __name__ == "__main__":
    test()

resources : PyTorch Computer Vision, I really liked adeshpande's cnn blog series meant for beginners : CNN 1, CNN 2, CNN 3, Writing CNNs from Scratch in PyTorch, from AlexNet to EfficientNet, Image Classification with Convolutional Neural Networks, Convolutional Neural Nets Explained and Implemented in Python (PyTorch), Vision Transformers (ViT) Explained + Fine-tuning in Python, Deep Learning for Computer Vision with Python and TensorFlow , PyTorch Image Segmentation Tutorial with U-NET, Building a Convolutional Neural Network in PyTorch.