Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of VQA in Keras #1

Open
wants to merge 64 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
fa0567a
Create README.md
feziodoshi May 23, 2017
d834d30
Create readme
feziodoshi May 23, 2017
0d631b6
Create readme
feziodoshi May 23, 2017
7738add
Add files via upload
feziodoshi May 23, 2017
ad5d0a9
Create readme
feziodoshi May 23, 2017
4fbf891
Create readme
feziodoshi May 23, 2017
286f37e
Delete readme
feziodoshi May 23, 2017
9c6c483
Create readme.md
feziodoshi May 23, 2017
d946af0
Add files via upload
feziodoshi May 23, 2017
b7d7a56
Update readme.md
feziodoshi May 23, 2017
0b549a9
Create readme.md
feziodoshi May 23, 2017
3f88929
Add files via upload
feziodoshi May 23, 2017
3776f31
Create readme.md
feziodoshi May 23, 2017
cd61a0f
Delete readme
feziodoshi May 23, 2017
42eccd5
Delete normal_2_lstm_nodistributed_2hidden.json
feziodoshi May 26, 2017
cc5acc2
Delete readme.md
feziodoshi May 26, 2017
0309380
Delete normal_1_lstm_timedistributed_2hidden.json
feziodoshi May 26, 2017
438db28
Delete readme.md
feziodoshi May 26, 2017
f6835c8
Delete readme.md
feziodoshi May 26, 2017
ae836ce
Delete create_normal_gru.py
feziodoshi May 26, 2017
960673f
Delete create_normal_lstm.py
feziodoshi May 26, 2017
b10ddde
Delete create_timedistributedGRU.py
feziodoshi May 26, 2017
4d7ddb5
Delete create_timedistributed_lstm.py
feziodoshi May 26, 2017
841a750
Delete readme
feziodoshi May 26, 2017
5ee0da0
Create readme.md
feziodoshi May 26, 2017
fad4f4e
Add files via upload
feziodoshi May 26, 2017
65b400a
Create readme.md
feziodoshi May 26, 2017
e3d3fd3
Add files via upload
feziodoshi May 26, 2017
f895c47
Delete vgg_19.py
feziodoshi May 26, 2017
c56015c
Update readme.md
feziodoshi May 26, 2017
90d74d1
Create readme.md
feziodoshi May 26, 2017
876019e
Add files via upload
feziodoshi May 26, 2017
9bff556
Create readme.md
feziodoshi May 26, 2017
8b6637b
Add files via upload
feziodoshi May 26, 2017
99fc98f
Create readme.md
feziodoshi May 26, 2017
4ae251c
Create readme.md
feziodoshi May 26, 2017
d54956c
Create readme.md
feziodoshi May 26, 2017
297c0ea
Update readme.md
feziodoshi May 26, 2017
9c944f8
Update readme.md
feziodoshi May 26, 2017
db66e11
Update readme.md
feziodoshi May 26, 2017
0f9c7f9
Delete workspace_2.py
feziodoshi May 26, 2017
49f817d
Add files via upload
feziodoshi May 26, 2017
2f72707
Add files via upload
feziodoshi May 26, 2017
72142df
Delete readme
feziodoshi May 26, 2017
1b8199e
Create readme.md
feziodoshi May 26, 2017
3e239cd
Update and rename README.md to README.md
feziodoshi May 26, 2017
f5933ee
Update readme.md
feziodoshi May 26, 2017
b76ae25
Create readme.md
feziodoshi May 26, 2017
c24fdd8
Add files via upload
feziodoshi May 26, 2017
a73342d
Add files via upload
feziodoshi May 26, 2017
899c087
Update readme.md
feziodoshi May 26, 2017
7285a6c
Update readme.md
feziodoshi May 26, 2017
93e2d48
Add files via upload
feziodoshi May 26, 2017
88f54fa
Add files via upload
feziodoshi May 26, 2017
910f275
Update readme.md
feziodoshi May 26, 2017
b6b2f33
Update readme.md
feziodoshi May 26, 2017
e21f25e
Add files via upload
feziodoshi May 26, 2017
c8ac39a
Update README.md
feziodoshi May 26, 2017
1b6d3b1
Update README.md
feziodoshi May 26, 2017
e882341
Update readme.md
feziodoshi May 28, 2017
0145618
Update workspace_2.py
feziodoshi May 28, 2017
24b2540
Update workspace_2.py
feziodoshi May 28, 2017
bacc01d
Update readme.md
feziodoshi May 28, 2017
943ee33
Update README.md
feziodoshi May 28, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions keras implementation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
## Visual Question Answering in Keras and Tensorflow Backend (by Fenil Doshi)

Implementation of the [VQA paper](https://arxiv.org/pdf/1505.00468v6.pdf). Website for Visual Question Answering -http://visualqa.org/

## Problem Description

Given an image and a natural Language question the task is to give a natural language answer. This is approached by encoding an image in a 4096 dimensional space which can be done by passing it through a VGG model. We will be removing the last 2 max pool layers in order to get the required dimension. The question can be encoded in 2 ways:
- Bag of Words
- Using Recurrent Neural Networks
Once the question is encoded, the encoded image and encoded question are merged together and passes through a feed forward deep net. Finally we compute the answers from one of the 1000 classes(as we will take into account only the 1000 most frequently occuring answers).

![](https://github.com/feziodoshi/VQA/blob/master/keras%20implementation/data/vqa_image.png)

## Requirements
- Tensorflow
- Keras
- scipy
- spacy
- sklearn , numpy
- nltk
- NVIDIA CUDA

download the spacy English glove vectors from https://nlp.stanford.edu/projects/glove/

## Dataset
Dataset Download link - http://visualqa.org/download.html
For more info on data preprocessing checkout the data folder in this directory

## To get started
Instruction in readme files in every folder

## Results
The 2 stacked GRU+CNN model converged faster than the corresponding LSTM over the same training set. I also figured out that the SGD optimizer worked better than RMSProp in normal LSTMs/GRUs but RMSProp worked better in case of a Time Distributed Layer. GRU with the time distributed layer gave a very low accuracy.

The models can be improvised way further by training it on the entire dataset for about >100 epochs on a better GPU(Tesla or GTX 1080 ). Overfitting can further be reduced by using Dropout and Regularization.


1. Validation Accuracy of LSTM + CNN = 33.77 %
2. Validation Accuracy of GRU + CNN = 34.4 %
3. Validation Accuracy of LSTM + Time Distributed Layer + CNN = 34.3 %

These accuracies are by training over just 10,000 examples. The accuracy can be improved by training over a larger set every epoch and over a better GPU. Currently trained it on a NVIDIA GTX 960M which took around 3 hours to train 10,000 images for 100 epochs.

## Some Improvements that can be made
- Better hyperparameter tuning
- Dropout
- Regularization
- Using a RNN decoder for answers to get answers with temporal semantics

## References
- https://arxiv.org/pdf/1505.00468v6.pdf
- https://github.com/avisingh599/visual-qa
- https://github.com/anantzoid/VQA-Keras-Visual-Question-Answering
9 changes: 9 additions & 0 deletions keras implementation/data/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Dataset Download link - http://visualqa.org/download.html

- Download and unzip the dataset
- [Then run this file](https://github.com/avisingh599/visual-qa/blob/master/scripts/dumpText.py) to get the data in .txt format and write the files in this folder

Refer https://github.com/avisingh599/visual-qa/tree/master/data to understand data preprocessing better


[Dataset Statistics](https://www.youtube.com/watch?v=nMr_sSAMpkE)
Binary file added keras implementation/data/vqa_image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading