GitHub - hung20gg/vi_clip: Training a clip model for vietnamese text

File structure

\trainer
	dataloader.py
	trainer.py
\model
	base_model.py
	crosslingual.py
	lossfn.py
	...
\evaluate
preprocess.py
args.py

Clone the repo and import

Import models:

from vi_clip.model import CLIP, SigLIP, LiT, SigLiT, CrossLingual, mCLIP, BaselineCLIP, BaselineSigLIP
from vi_clip.args import model_args

model = SigLiT(**model_args)

Import trainers

from vi_clip.trainer import Trainer, CrossLingualTrainer, ddp_train
from vi_clip.args import training_args, model_args

# Training with single GPU or DataParallel
training_args['train_type'] = 'single' # single GPU
training_args['train_type'] = 'dp' # DataParallel

trainer = Trainer(model_args, training_args)
trainer.train()

# Training with Distributed Data Parallel
training_args['train_type'] = 'ddp'
ddp_train(model_args, training_args)

Model description

Every model has these attributes:

text_model
tokenizer
vision_model
processor

And these methods:

encode_image() str, list[str] (for image dir), Image, list[Image], np.ndarray, torch.Tensor
encode_text() str, list[str]
forward(images, texts) same as those 2

Need changing:

Only include text encoder
Adding text projection layer (nn.Linear) even if both ViT and BERT embedding share the same dimension.
Implementation for freezing BERT and only train projection layer in some early epoch + different learning rate.
Test the Evaluate class
Pre-embedding, upload and download scripts (file must be in some order idk)
Only change the embedding layer with new vocab (which dataset to change the vocab ??)

Download the dataset

import the download_and_extract_batches from vi_clip.preprocess, passing hf repo and local dir, it will download the repo (include image) in this structure

(These are some bugs at the moment, so hehe)

\dataset_name
	dataset_caption.parquet
	\image
		1.jpg
		2.jpg

Data format The parquet file should be like this

image_id	image	text_id	caption
000001	name.jpg	005	giám đốc

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
evaluate		evaluate
model		model
sample		sample
trainer		trainer
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
args.py		args.py
environment.yml		environment.yml
preprocess.py		preprocess.py
rand.py		rand.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clone the repo and import

Model description

Download the dataset

About

Packages

Languages

hung20gg/vi_clip

Folders and files

Latest commit

History

Repository files navigation

Clone the repo and import

Model description

Download the dataset

About

Topics

Resources

Stars

Watchers

Forks

Packages 0

Languages

Packages