Computer Vision Calculator

Using the MNIST dataset for the digits model, and the letter data set: https://www.kaggle.com/datasets/sachinpatel21/az-handwritten-alphabets-in-csv-format for the letter model, train my own implementation of a deep neural network to recognize the single elements of a system of linear equations to solve it.

1. Requirements

In order for this application to work, the photo must have the next format:

A big square / rectangle must surround the equations.
The equations have to be visually separated from one another.
The characters from each equation have to be visually separated from one another.
The format of the equations must be: number, letter, algebraic symbol. Even if the number is a 0.

2. Process

Using the data/perspective.py file, we want to get the block equation in order to identify the character's equations. Here is the process:

2.1 Segmentation of the block

Using some computer vision techniques (such as Otsu segmenation, dilation and erosion) we get the area in which the equations are written. That is why the equations have to be surrounded by a big box.

2.2 Vertices

If the image is rotated, we need to obtain the vertices of the big box to tranform the perspective:

2.3 Getting the individual equations

Using only erosion and dilation over the area we got in the section 2.1, we obtain the individual equations:

2.4 Single Characters

Once we have each equation (and its correspondent coordinates), we can again apply erosion and dilation over that equation's area to obtain the single characters that then will be fed to the neural network:

3. Prediction

Once we have the window in which the single characters are in one equation, we have to format this images so we can feed them to the different neural networks models in order to make predictions. Both the MNIST digits and the letter digits are images with shape: (28, 28) whose pixels take values: [0, 1]. In order to get a nice figure of the single characters we will apply:

Bounding box. To center the digits.
Padding. To have some margin on the bordes of the digits.
Bool values. The pixels of the pad image can only take values of 0 or 1.

The result of this process:

If we look at the image of the 2.4 Section we can a see a huge difference between the number 1 detected on that image and the processed one. This is the image we will fed into our models. The same applies to the letter images.

3.1 Algebraic predictions

In order to make predictions over the algebraic symbols, we make a function that computes the number of components an image has.

Equal sign. The number of labels (without considering the background) is 2.
Plus sign and minus sing. The number of labels (without considering the background) is 1. Because of the preprocess of the images to make them have the correct format so we can feed them to the models, the plus sign will maintain its original form, whereas the minus sign will be much bigger. That is why we define a threshold area to distinguish one from another.

4. Results

Even though the results are not very accurate for a task like this (with a Bayes Error of approximately 0 %), we can extract some conclusions:

The clarity of the single elements is fundamental in order to predict correctly. The MNIST dataset is one of the cleanest datasets out there and the characters we get from each equation are not from the same distribution.
Our models work so the only thing we need to be worried about is to get data from the same distribution.
The letter model works perfectly. It has a simple explanation: since we knew beforehand that the system of linear equations would be of 3 equations (3 not known variables), we do not need to train the neural network over the hole alphabet dataset, only in the letters that appear in our equations. If the sytem has 6 different letters, the neural network should be trained on those letters to improve efficiency and accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
images		images
utils		utils
.gitignore		.gitignore
Readme.md		Readme.md
calculator.ipynb		calculator.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computer Vision Calculator

1. Requirements

2. Process

2.1 Segmentation of the block

2.2 Vertices

2.3 Getting the individual equations

2.4 Single Characters

3. Prediction

3.1 Algebraic predictions

4. Results

About

Releases

Packages

Languages

jimysancho/Computer-Vision-Calculator

Folders and files

Latest commit

History

Repository files navigation

Computer Vision Calculator

1. Requirements

2. Process

2.1 Segmentation of the block

2.2 Vertices

2.3 Getting the individual equations

2.4 Single Characters

3. Prediction

3.1 Algebraic predictions

4. Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages