SLAM-LLM

SLAM-LLM is a deep learning toolkit that allows researchers and developers to train custom multimodal large language model (MLLM), focusing on Speech, Language, Audio, Music processing. We provide detailed recipes for training and high-performance checkpoints for inference.

News

[Update May. 22, 2024] Please join slack or WeChat group. We will sync our updates and Q&A here.
[Update May. 21, 2024] Recipes for Spatial Audio Understanding has been supported.
[Update May. 20, 2024] Recipes for music caption (MC) has been supported.
[Update May. 8, 2024] Recipes for visual speech recognition (VSR) has been supported.
[Update May. 4, 2024] Recipes for zero-shot text-to-speech (TTS) has been supported.
[Update Apr. 28, 2024] Recipes for automated audio captioning (AAC) has been supported.
[Update Mar. 31, 2024] Recipes for automatic speech recognition (ASR) has been supported.

Installation

git clone https://github.com/huggingface/transformers.git
cd transformers
git checkout tags/v4.35.2
pip install -e .
cd ..
git clone https://github.com/huggingface/peft.git
cd peft
git checkout tags/v0.6.0
pip install -e .
cd ..
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
git clone https://github.com/ddlBoJack/SLAM-LLM.git
cd SLAM-LLM
pip install  -e .

For some examples, you may need to use fairseq, the command line is as follows:

# you need to install fairseq before SLAM-LLM
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

We also provide a docker image for convenience:

# build docker image
docker build -t slam-llm:latest .

# run docker image with gpu
docker run -it --gpus all --name slam --shm-size=256g slam-llm:latest /bin/bash

Usage

List of Recipes

We provide reference implementations of various LLM-based speech, audio, and music tasks:

Speech Task
Audio Task
- Automated Audio Captioning (AAC)
- Spatial Audio Understanding
Music Task
- Music Caption (MC)

Configuration Priority

We provide hierarchical configuration inheritance relationships as follows:

command-line (shell file) > Hydra configuration (yaml file) > dataclass configuration (Python file)

Features

Easily extend to new models and tasks.
Detailed recipes for training and high-performance checkpoints for inference.
Mixed precision training which trains faster with less GPU memory on NVIDIA tensor cores.
Multi-GPU training with data and model parallel, supporting DDP, FSDP and deepspeed (still need to be improved).
Flexible configuration based on Hydra and dataclass allowing a combination of code, command-line and file based configuration.

Acknowledge

We borrow code from Llama-Recipes for the training process.
We borrow code from Fairseq for deepspeed configuration.
We thank the contributors for providing diverse recipes.

Name		Name	Last commit message	Last commit date
Latest commit History 593 Commits
.github		.github
docs		docs
examples		examples
scripts		scripts
src/slam_llm		src/slam_llm
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
dev_requirements.txt		dev_requirements.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SLAM-LLM

Table of Contents

News

Installation

Usage

List of Recipes

Configuration Priority

Features

Acknowledge

About

Releases

Packages

Languages

License

yanghaha0908/SLAM-LLM

Folders and files

Latest commit

History

Repository files navigation

SLAM-LLM

Table of Contents

News

Installation

Usage

List of Recipes

Configuration Priority

Features

Acknowledge

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages