This branch is 19 commits ahead of, 3 commits behind main.

Name	Name	Last commit message	Last commit date
Latest commit charles9304 fix broadcast for moe experts. Jan 31, 2025 36bfb52 · Jan 31, 2025 History 173 Commits
.github	.github	run unit test on current change (#137 )	Oct 24, 2024
chatlearn	chatlearn	fix broadcast for moe experts.	Jan 31, 2025
docker/torch	docker/torch	[chore] Add `accelerate` to requirements.txt (#128 )	Oct 21, 2024
docs	docs	Delete unused env. (#154 )	Nov 18, 2024
examples	examples	upgrade to vllm0.6.6	Jan 22, 2025
tests	tests	fix[sampler]: add `drop_last` option (#211 )	Jan 22, 2025
.gitignore	.gitignore	update code to 1.0.0 (#36 )	Aug 28, 2024
.pylintrc	.pylintrc	Add docs and bug fix (#2 )	Aug 21, 2023
LICENSE	LICENSE	Initial commit	Aug 16, 2023
Makefile	Makefile	Support mcts model flow (#138 )	Nov 12, 2024
README.md	README.md	Add feature list. (#124 )	Oct 17, 2024
README_CN.md	README_CN.md	Add feature list. (#124 )	Oct 17, 2024
requirements.txt	requirements.txt	[chore] Add `accelerate` to requirements.txt (#128 )	Oct 21, 2024
setup.py	setup.py	fix long_description error when publishing package (#93 )	Sep 24, 2024

Repository files navigation

A flexible and efficient training framework for large-scale alignment

English | 中文

Latest News 🔥

[2024/8] We officially released ChatLearn! Check out our documentation.
[ongoing] We are continuously hiring and welcome you to contact us or submit your resume to [email protected].

ChatLearn is a large-scale alignment training framework developed by the Alibaba Cloud PAI platform.

Chatlearn has the following advantages:

User-friendly programming interface: Users can focus on programming individual models by wrapping a few functions, while the system takes care of resource scheduling, data and control flow transmission, and distributed execution.
Highly Scalable Training Methodology: ChatLearn offers alignment training such as RLHF, DPO, OnlineDPO and GRPO, while also supporting user-defined execution flows for models, enabling a highly convenient and customizable training process.
Diverse Distributed Acceleration Engines: Users can leverage various computational backends for model construction, such as Megatron-LM, DeepSpeed, vLLM, and others. For instance, we can use Megatron-LM for training and vLLM to expedite inference.
Flexible Parallel Strategies and Resource Allocation: ChatLearn supports different parallel strategies for various model configurations, enabling the formulation of distinct parallel approaches tailored to each model's computational, memory, and communication characteristics. Additionally, ChatLearn features a flexible resource scheduling mechanism that accommodates exclusive or shared use of resources across models. Through its system scheduling policies, it facilitates efficient serial/parallel execution and optimized GPU memory sharing, enhancing overall performance and efficiency.
High performance: Compared to current state-of-the-art (SOTA) systems, ChatLearn achieves a 52% performance improvement at the 7B+7B(Policy+Reward) scale and a 137% improvement at the 70B+70B scale. Meanwhile, ChatLearn supports larger-scale alignment training, such as 300B+300B.

By providing a comprehensive and efficient framework, ChatLearn empowers researchers and practitioners to train large-scale alignment models with ease, scalability, and improved performance.

Quick Start

Please refer to the documentation for a quick start.

Performance

We compared the RLHF training throughput of models with different parameter scales, adopting an N+N model configuration where both the Policy model and the Reward model have the same number of parameters. We benchmarked against DeepSpeed-Chat and OpenRLHF with 7B and 70B model configurations. For the 8 GPU setup with a 7B+7B scale, we achieved a 115% speedup; for the 32 GPU setup with a 70B+70B scale, the speedup was 208%. The larger the scale, the more pronounced the acceleration effect becomes. Additionally, ChatLearn can support even larger-scale alignment training, such as at a 300B+300B scale.

Note: The performance of DeepSpeed-Chat and OpenRLHF has already been optimized.

Feature List

Supports RLHF, DPO, OnlineDPO, GRPO, and user-defined Alignment training methods.
Supports Megatron-LM as the backend for training or inference, and vLLM as the backend for inference.
Supports independent configuration of parallel strategies for different models, and efficient parameter synchronization between models.
Supports EMS (Efficient Memory Sharing) functionality, enabling efficient memory sharing between models.
Supports resource types for models: GPU, CPU, such as defining a pure CPU-based Math Reward model.
Support models with Megatron-Core format.

Roadmap

The upcoming features for ChatLearn include:

Support the alignment training for MoE (Mixture of Experts) models
Integration with DeepSpeed as a training backend
Support for more models
Performance Optimization
Support for more alignment algorithms

We welcome community partners to collaborate and contribute to the development, and welcome to join the DingTalk group: 98090003312 to participate in the discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A flexible and efficient training framework for large-scale alignment

Quick Start

Performance

Feature List

Roadmap

About

Releases 5

Packages

Contributors 9

Languages

License

alibaba/ChatLearn

Folders and files

Latest commit

History

Repository files navigation

A flexible and efficient training framework for large-scale alignment

Quick Start

Performance

Feature List

Roadmap

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 9

Languages

Packages