A SCIENTIFIC DOMAIN QUESTION ANSWERING SYSTEM EMPLOYING A MULTIMODAL CHAIN OF THOUGHTS APPROACH

"Imagine learning a textbook without figures or tables."

Multimodal-CoT incorporates vision features in a decoupled training framework. The framework consists of two training stages: (i) rationale generation and (ii) answer inference. Both stages share the same model architecture but differ in the input and output.

Requirements

Install all required python dependencies:

pip install -r requirements.txt

Datasets

Download the dataset from the following repository in images folder:

https://github.com/lupantech/ScienceQA/tree/main/data

Extract Features (optional)

The following instructions show how we obtain those features.

Download the image files from Google Drive and unzip all the images (train, dev, test) in the same folder (). The structure should be:

Run extract_features.py --data_root images --output_dir vision_features --img_type vit

Extract Captions (optional)

The processed captions for ScienceQA are available in data/instruct_captions folder.

The following instructions show how we obtain those features.

Intall lavis and prepare Vicuna weights to use InstructBLIP for caption extraction.

https://github.com/salesforce/LAVIS/tree/f982acc73288408bceda2d35471a8fcf55aa04ca/projects/instructblip

Assume that the images are stored in the images folder.

python extract_caption.py

Instructions

Training

# rationale generation
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py \
    --data_root data/ScienceQA/data \
    --caption_file data/instruct_captions.json \
    --model declare-lab/flan-alpaca-large \
    --user_msg rationale --img_type vit \
    --bs 2 --eval_bs 4 --epoch 50 --lr 5e-5 --output_len 512 \
    --use_caption --use_generate --prompt_format QCM-E \
    --output_dir experiments

# answer inference
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_central.py \
    --data_root data/ScienceQA/data \
    --caption_file data/instruct_captions.json \
    --model declare-lab/flan-alpaca-large \
    --user_msg answer --img_type vit \
    --bs 4 --eval_bs 8 --epoch 50 --lr 5e-5 --output_len 64 \
    --use_caption --use_generate --prompt_format QCMG-A \
    --output_dir experiments \
    --eval_le experiments/rationale_declare-lab-flan-alpaca-large_vit_QCM-E_lr5e-05_bs8_op512_ep50/predictions_ans_eval.json \
    --test_le experiments/rationale_declare-lab-flan-alpaca-large_vit_QCM-E_lr5e-05_bs8_op512_ep50/predictions_ans_test.json

Acknowledgement

Part of our codes are adapted from mm-cot, ScienceQA, Transformers, pytorch-image-models.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
data		data
lavis		lavis
notebooks		notebooks
timm		timm
vision_features		vision_features
.gitignore		.gitignore
README.md		README.md
evaluations.py		evaluations.py
extract_caption.py		extract_caption.py
extract_features.py		extract_features.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
requirements_dataset_analysis.txt		requirements_dataset_analysis.txt
requirements_lavis.txt		requirements_lavis.txt
run_inference.sh		run_inference.sh
run_training.sh		run_training.sh
utils_data.py		utils_data.py
utils_evaluate.py		utils_evaluate.py
utils_prompt.py		utils_prompt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A SCIENTIFIC DOMAIN QUESTION ANSWERING SYSTEM EMPLOYING A MULTIMODAL CHAIN OF THOUGHTS APPROACH

"Imagine learning a textbook without figures or tables."

Requirements

Datasets

Extract Features (optional)

Extract Captions (optional)

Instructions

Training

Acknowledgement

About

Releases

Packages

Languages

pavankale2709/multimodal-cot-ms-project

Folders and files

Latest commit

History

Repository files navigation

A SCIENTIFIC DOMAIN QUESTION ANSWERING SYSTEM EMPLOYING A MULTIMODAL CHAIN OF THOUGHTS APPROACH

"Imagine learning a textbook without figures or tables."

Requirements

Datasets

Extract Features (optional)

Extract Captions (optional)

Instructions

Training

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages