EdgeVLA: An Open-Source Vision-Language-Action Model

K-Scale Open Source Robotics

EdgeVLA: An Open-Source Vision-Language-Action Model

Introduction

We propose training efficient VLA models based on SLMs like Qwen2 with non-autoregressive objective. Our early results shows that these models achieve similar training characteristics compared to much larger counterparts. This repository is a direct fork of Prismatic VLMs and OpenVLA. You can train from scratch, finetune or test our pre-trained models. See our blog or our report for more details about the architecture.

Setup

conda create --name evla python=3.10
conda activate evla
cd evla
pip install -e .

Now you have to add HF TOKEN under .hf_token to run models like llama2/3 or qwen2.

Training/Inference

You can either train your own model from scratch or finetune a model with your own dataset. We recommend first running the debug mode to see if everything works.

CUDA_VISIBLE_DEVICES=0 LOCAL_RANK=0 MASTER_ADDR=localhost MASTER_PORT=1235 python vla-scripts/test.py \
 --vla.type "debug" \
 --data_root_dir DATA_ROOT_DIR \
 --run_root_dir RUN_ROOT_DIR

The full-scale training can be run with the 'evla' config from prismatic/conf/vla.py.

TODO

Remove the hardcoded attention setup.
Export model to the HF format.
Add support for LoRA.

Citation

@article{kscale2024evla,
    title={EdgeVLA: Efficient Vision-Language-Action Models},
    author={Paweł Budzianowski, Wesley Maa, Matthew Freed, Jingxiang Mo, Aaron Xie, Viraj Tipnis, Benjamin Bolte},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
experiments/robot		experiments/robot
prismatic		prismatic
scripts		scripts
vla-scripts		vla-scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-min.txt		requirements-min.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EdgeVLA: An Open-Source Vision-Language-Action Model

Introduction

Setup

Training/Inference

TODO

Citation

About

Releases

Packages

Languages

License

ramkumarkoppu/evla

Folders and files

Latest commit

History

Repository files navigation

EdgeVLA: An Open-Source Vision-Language-Action Model

Introduction

Setup

Training/Inference

TODO

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages