SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

🚀🚀🚀 Official implementation of SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models.

Quick Usage

Data Preparation

Prepare your captions in JSON format. Here’s an example of how your JSON should look:

[
    {
        "id": 1,
        "caption": "The bus in the image is white and red. The back of the bus features an advertisement. The bus is driving down the street, which is crowded with people and other vehicles."
    },
    {
        "id": 2,
        "caption": "The dog in the image is brown with a red collar. It sits behind a window, looking out longingly, which gives it a sense of longing for the outdoors or something it sees."
    },
]

Data Generation Instructions

To generate images, run the run.sh script. The settings can be adjusted as follows:

Resolution: Default is set to 1024x1024 pixels. Modify width and height in the script to customize.
Repetitions: By default, each caption generates one image. Adjust the repeat parameter to increase the number of images per caption, selecting the best quality image for each.

Model Selection

The process supports two diffusion models:

Stable Diffusion 3: Default model, optimized for a balance between speed and quality.
Stable Diffusion XL: Use this model for faster image generation.

License

Usage and License Notices: The data and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
clipscore		clipscore
configs		configs
demo		demo
diffusions		diffusions
images/part_0/00000		images/part_0/00000
jsons		jsons
utils		utils
README.md		README.md
pipeline.py		pipeline.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

Quick Usage

Data Preparation

Data Generation Instructions

Model Selection

License

About

Releases

Packages

Languages

starriver030515/SynthVLM

Folders and files

Latest commit

History

Repository files navigation

SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

Quick Usage

Data Preparation

Data Generation Instructions

Model Selection

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages