GitHub - AugustinJose1221/FPGA-Build: A novel architectural design for stitching video streams in real-time on an FPGA.

FPGA Architecture for Real-time Video Stitching

A novel architectural design for stitching video streams in real-time on an FPGA.
Explore the docs »

Table of Contents

About The Project
- Algorithm
- Top Level Design
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact

About The Project

The designed architecture generates a video having a wider feild of view by stitching two video input based on features and keypoints. In simple terms, the output generated will be a panorama but with video. The architecture is optimized such that the output can be produced in real-time.

Algorithm

The figure below illustrates the block diagram of the system depicting each step of the algorithm.

The system can be broadly divided into three subystems:

Preprocessing
SIFT Based Feature Extraction
Frame Stitching

Preprocessing

The input video stream for the system is in 8 bit RGB format. The input 8 bit image is shown in figure. Each individual frame of the video stream will have three channels corresponding to red, green and blue. The colour information in the video frames does not enhance feature detection. Moreover, computation on a 3 channel 8 bit image takes more time compared to a single channel 8 bit image. Therefore, the RGB video frame is converted to an 8 bit grayscale image. The generated grayscale images will have lesser noise, more details in the shadows and provides better computational efficiency, shown in figure.


Input image	Grayscale image

SIFT Based Feature Extraction

Feature extraction from the grayscale images is done using SIFT algorithm. SIFT algorithm can be separated into two main steps:

Keypoint Detection

SIFT operation begins with discrete convolution of the input image with different Gaussian filters. A Gaussian filter is a widely used image smoothing algorithm defined as:

In the above equation, G is the Gaussian kernel at the point (x, y) and σ is the Gaussian parameter. Using a larger value of σ produces a greater smoothing effect on the image. Discrete convolution of the image with Gaussian kernel generates an image with lesser noise and lesser details. In SIFT, discrete convolution with Gaussian kernel is done with four different values of σ. Progressively higher values of σ is used to generate a set of blurred images or an octave.


Input image	Sigma = 1.6	Sigma = 2.26	Sigma = 3.2	Sigma = 4.5

For a given value of σ, the sum of all coefficients in the convolution kernal should be equal to unity. Therefore, the size of the kernal increases as the value of σ increases.

Once the octave is generated, a DoG space is built based on the four images in the octave. DoG stands for difference of Gaussian. DoG is a very computationally efficient approximation of Laplacian of Gaussian (LoG). The DoG space is built by computing the difference between two adjacent Gaussian scale images, pixel by pixel. DoG space of four images in the octave will have three levels.


Top level DoG	Middle level DoG	Bottom level DoG

Keypoints are extracted from the DoG space by finding the local maxima or minima values. A pixel is considered a keypoint if it is a local maxima or minima within a 26 pixel neighbourhood consisting of 9 pixels in the top level, 8 pixels in the middle level and 9 pixels in the bottom level.

Keypoints


Keypoints using OpenCV sift function	Keypoints using SIFT implementation in Python	Keypoint generated by the FPGA design

Descriptor Generation

Keypoint descriptor is a unique identifier for a particular keypoint. SIFT uses gradient magnitude and direction of the keypoint as the basis for the descriptor. Gradient magnitude and direction at a point can be calculated by discrete convolution of the image with Sobel filters.

Sobel convolution output

To generate the keypoint descriptor, gradient magnitude and direction of every point inside a 16x16 window around each keypoint is calculated. The gradient magnitudes of the 16x16 window is convolved with a Gaussian kernel. The gradient magnitudes in every 4x4 cell is combined such that the 16x16 window is reduced to a 4x4 window and 16 gradient directions. Finally, these 16 gradient directions are transferred into eight bins. Hence a 128 element vector is built which acts as the keypoint descriptor.

Frame Stitching

Frame stitching is the process of combining two frames into a single image. Frame stitching is done in two steps:

Keypoint Matching

The keypoint descriptors of keypoints in the video frames from both camera sensors are compared. If the difference between the keypoint descriptors of two keypoints, one from each camera sensor, is below a error threshold, then they are considered as a keypoint pair. The keypoint pair with the least difference between their keypoint descriptors is taken as the reference keypoints.

Input image from left camera Input image from right camera
Image Blending

A weighed average method is used to blend the two frames into a single image. The values of pixels in the overlapped region is equal to the weighted average values of pixels of both the frames. The weights are chosen based on the distance between the overlapped pixel and the border of the corresponding frame.

Stitched image

Top Level Design

The block schematic of the architecture from top level is shown in figure below.

Block Schematic

The top level design is divided into five stages:

Getting Started

Prerequisites

The following packages needs to be installed on the Linux system before executing the source code.

Icarus Verilog
```
apt-get install iverilog
```
Python
```
apt-get install python3
```
OpenCV
```
pip3 install opencv-contrib-python
```
numpy
```
pip3 install numpy
```
PIL (Python Image Library)
```
pip3 install pillow
```

Installation

Clone the repo

git clone https://github.com/AugustinJose1221/FPGA-Build.git

Change working directory
```
cd FPGA-Build/make
```
Compile the design
```
make create
```
To view the RTL waveform
```
make simulate
```
Generate output image
```
python3 hexToImage.py
```

Usage

Project Tree

Templates
vcd
design
- README.md
- Controller.v
- matcher.v
- display.v
- camera.v
- descriptor.v
- filter5x5.v
- Grayscaler.v
- image.v
- image2.v
- keypoints.v
- RWM_1.v
- RWM_2.v
- sobel_filter.v
- stitcher.v
res
make
- Makefile
outfiles
- display
- FILTER
- Gaussian
- image
- interface
- output.bin
testbenches
img
README.md
tree.sh

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Twitter: @augustinjose121
Gmail: [email protected]
Discuss: Github Discussions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FPGA Architecture for Real-time Video Stitching