Releases · hpcaitech/ColossalAI

20 Feb 03:37

github-actions

v0.4.8

9379cbd

Version v0.4.8 Release Today! Latest

Latest

What's Changed

Release

[release] update version (#6195) by Hongxin Liu

Doc

[doc] DeepSeek V3/R1 news (#6199) by binmakeswell

Application

[application] add lora sft example data (#6198) by Hongxin Liu
[application] Update README (#6196) by Tong Li
[application] add lora sft example (#6192) by Hongxin Liu

Pre-commit.ci

Add GRPO and Support RLVR for PPO (#6186) by YeAnbang

Checkpointio

[checkpointio] fix for async io (#6189) by flybird11111
[checkpointio] fix checkpoint for 3d (#6187) by flybird11111
[checkpointio] gather tensor before unpad it if the tensor is both padded and distributed (#6168) by Lemon Qin
[checkpointio] support load-pin overlap (#6177) by Hongxin Liu

Hotfix

[hotfix] fix zero optim save (#6191) by Hongxin Liu
[hotfix] fix hybrid checkpointio for sp+dp (#6184) by flybird11111

Shardformer

[shardformer] support pipeline for deepseek v3 and optimize lora save (#6188) by Hongxin Liu
[shardformer] support ep for deepseek v3 (#6185) by Hongxin Liu

Ci

[CI] Cleanup Dist Optim tests with shared helper funcs (#6125) by Wenxuan Tan

Issue template

[Issue template] Add checkbox asking for details to reproduce error (#6104) by Wenxuan Tan

Inference

[Inference]Fix example in readme (#6178) by Guangyao Zhang

Full Changelog: v0.4.8...v0.4.7

Assets 2

03 Jan 03:53

github-actions

v0.4.7

479067e

Version v0.4.7 Release Today!

What's Changed

Release

[release] update version (#6174) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] pre-commit autoupdate (#6113) by pre-commit-ci[bot]

Sharderformer

[Sharderformer] Support zbv in Sharderformer Policy (#6150) by duanjunwen

Checkpointio

[checkpointio] support non blocking pin load (#6172) by Hongxin Liu
[checkpointio]support asyncio for 3d (#6152) by flybird11111
[checkpointio] fix async io (#6155) by flybird11111
[checkpointio] support debug log (#6153) by Hongxin Liu
[checkpointio] fix zero optimizer async save memory (#6151) by Hongxin Liu
Merge pull request #6149 from ver217/hotfix/ckpt by Wang Binluo
[checkpointio] disable buffering by ver217
[checkpointio] fix pinned state dict by ver217
[checkpointio] fix size compute by ver217
[checkpointio] fix performance issue (#6139) by Hongxin Liu
[checkpointio] support async model save (#6131) by Hongxin Liu

News

[news] release colossalai for sora (#6166) by binmakeswell

Hotfix

[hotfix] improve compatibility (#6165) by Hongxin Liu
[Hotfix] hotfix normalization (#6163) by duanjunwen
[hotfix] fix zero comm buffer init (#6154) by Hongxin Liu
[hotfix] fix flash attn window_size err (#6132) by duanjunwen

Doc

[doc] add bonus event (#6164) by binmakeswell
[doc] update cloud link (#6148) by Sze-qq
[doc] add hpc cloud intro (#6147) by Sze-qq

Device

[Device]Support npu (#6159) by flybird11111

Fix

[fix] fix bug caused by perf version (#6156) by duanjunwen
[fix] multi-node backward slowdown (#6134) by Hanks

Optim

[optim] hotfix adam load (#6146) by Hongxin Liu

Zerobubble

[Zerobubble] merge main. (#6142) by duanjunwen

Async io

[async io]supoort async io (#6137) by flybird11111

Ckpt

[ckpt] Add async ckpt api (#6136) by Wang Binluo

Cli

[cli] support run as module option (#6135) by Hongxin Liu

Zero

[zero] support extra dp (#6123) by Hongxin Liu

Coati

[Coati] Refine prompt for better inference (#6117) by Tong Li

Plugin

[plugin] support get_grad_norm (#6115) by Hongxin Liu

Full Changelog: v0.4.7...v0.4.6

Assets 2

04 Nov 09:28

github-actions

v0.4.6

13ffa08

Version v0.4.6 Release Today!

What's Changed

Release

[release] update version (#6109) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] pre-commit autoupdate (#6078) by pre-commit-ci[bot]

Checkpointio

[checkpointio] fix hybrid plugin model save (#6106) by Hongxin Liu

Mcts

[MCTS] Add self-refined MCTS (#6098) by Tong Li

Doc

[doc] sora solution news (#6100) by binmakeswell

Extension

[extension] hotfix compile check (#6099) by Hongxin Liu

Hotfix

Merge pull request #6096 from BurkeHulk/hotfix/lora_ckpt by Hanks

Full Changelog: v0.4.6...v0.4.5

Assets 2

21 Oct 02:21

github-actions

v0.4.5

19baab5

Version v0.4.5 Release Today!

What's Changed

Release

[release] update version (#6094) by Hongxin Liu

Misc

[misc] fit torch api upgradation and remove legecy import (#6093) by Hongxin Liu

Fp8

[fp8] add fallback and make compile option configurable (#6092) by Hongxin Liu

Chore

[chore] refactor by botbw

Ckpt

[ckpt] add safetensors util by botbw

Pipeline

[pipeline] hotfix backward for multiple outputs (#6090) by Hongxin Liu

Ring attention

[Ring Attention] Improve comments (#6085) by Wenxuan Tan
Merge pull request #6071 from wangbluo/ring_attention by Wang Binluo

Coati

[Coati] Train DPO using PP (#6054) by Tong Li

Shardformer

[shardformer] optimize seq parallelism (#6086) by Hongxin Liu
[shardformer] fix linear 1d row and support uneven splits for fused qkv linear (#6084) by Hongxin Liu

Full Changelog: v0.4.5...v0.4.4

Assets 2

19 Sep 02:53

github-actions

v0.4.4

dabc2e7

Version v0.4.4 Release Today!

What's Changed

Release

[release] update version (#6062) by Hongxin Liu

Colossaleval

[ColossalEval] support for vllm (#6056) by Camille Zhong

Moe

[moe] add parallel strategy for shared_expert && fix test for deepseek (#6063) by botbw

Sp

Merge pull request #6064 from wangbluo/fix_attn by Wang Binluo
Merge pull request #6061 from wangbluo/sp_fix by Wang Binluo

Doc

[doc] FP8 training and communication document (#6050) by Guangyao Zhang
[doc] update sp doc (#6055) by flybird11111

Fp8

[fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059) by Guangyao Zhang
[fp8] fix missing fp8_comm flag in mixtral (#6057) by botbw
[fp8] hotfix backward hook (#6053) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Hotfix

[hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048) by botbw

Feature

[Feature] Split cross-entropy computation in SP (#5959) by Wenxuan Tan

Full Changelog: v0.4.4...v0.4.3

Assets 2

10 Sep 02:39

github-actions

v0.4.3

b3db105

Version v0.4.3 Release Today!

What's Changed

Release

[release] update version (#6041) by Hongxin Liu

Fp8

[fp8] disable all_to_all_fp8 in intranode (#6045) by Hanks
[fp8] fix linear hook (#6046) by Hongxin Liu
[fp8] optimize all-gather (#6043) by Hongxin Liu
[FP8] unsqueeze scale to make it compatible with torch.compile (#6040) by Guangyao Zhang
Merge pull request #6012 from hpcaitech/feature/fp8_comm by Hongxin Liu
Merge pull request #6033 from wangbluo/fix by Wang Binluo
Merge pull request #6024 from wangbluo/fix_merge by Wang Binluo
Merge pull request #6023 from wangbluo/fp8_merge by Wang Binluo
[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016) by Wang Binluo
[fp8] zero support fp8 linear. (#6006) by flybird11111
[fp8] add use_fp8 option for MoeHybridParallelPlugin (#6009) by Wang Binluo
[fp8]update reduce-scatter test (#6002) by flybird11111
[fp8] linear perf enhancement by botbw
[fp8] update torch.compile for linear_fp8 to >= 2.4.0 (#6004) by botbw
[fp8] support asynchronous FP8 communication (#5997) by flybird11111
[fp8] refactor fp8 linear with compile (#5993) by Hongxin Liu
[fp8] support hybrid parallel plugin (#5982) by Wang Binluo
[fp8]Moe support fp8 communication (#5977) by flybird11111
[fp8] use torch compile (torch >= 2.3.0) (#5979) by botbw
[fp8] support gemini plugin (#5978) by Hongxin Liu
[fp8] support fp8 amp for hybrid parallel plugin (#5975) by Hongxin Liu
[fp8] add fp8 linear (#5967) by Hongxin Liu
[fp8]support all2all fp8 (#5953) by flybird11111
[FP8] rebase main (#5963) by flybird11111
Merge pull request #5961 from ver217/feature/zeor-fp8 by Hanks
[fp8] add fp8 comm for low level zero by ver217

Hotfix

[Hotfix] Remove deprecated install (#6042) by Tong Li
[Hotfix] Fix llama fwd replacement bug (#6031) by Wenxuan Tan
[Hotfix] Avoid fused RMSnorm import error without apex (#5985) by Edenzzzz
[Hotfix] README link (#5966) by Tong Li
[hotfix] Remove unused plan section (#5957) by Tong Li

Colossalai/checkpoint_io/...

[colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020) by Gao, Ruiyuan

Colossal-llama

[Colossal-LLaMA] Refactor latest APIs (#6030) by Tong Li

Plugin

[plugin] hotfix zero plugin (#6036) by Hongxin Liu
[plugin] add cast inputs option for zero (#6003) (#6022) by Hongxin Liu
[plugin] add cast inputs option for zero (#6003) by Hongxin Liu

Ci

[CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018) by Wenxuan Tan

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5995) by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Colossalchat

[ColossalChat] Add PP support (#6001) by Tong Li

Misc

[misc] Use dist logger in plugins (#6011) by Edenzzzz
[misc] update compatibility (#6008) by Hongxin Liu
[misc] Bypass the huggingface bug to solve the mask mismatch problem (#5991) by Haze188
[misc] remove useless condition by haze188
[misc] fix ci failure: change default value to false in moe plugin by haze188
[misc] remove incompatible test config by haze188
[misc] remove debug/print code by haze188
[misc] skip redunant test by haze188
[misc] solve booster hang by rename the variable by haze188

Feature

[Feature] Zigzag Ring attention (#5905) by Edenzzzz
[Feature]: support FP8 communication in DDP, FSDP, Gemini (#5928) by Hanks
[Feature] llama shardformer fp8 support (#5938) by Guangyao Zhang
[Feature] MoE Ulysses Support (#5918) by Haze188

Chat

[Chat] fix readme (#5989) by YeAnbang
Merge pull request #5962 from hpcaitech/colossalchat by YeAnbang
[Chat] Fix lora (#5946) by YeAnbang

Test ci

[test ci]Feature/fp8 comm (#5981) by flybird11111

Docs

[Docs] clarify launch port by Edenzzzz

Test

[test] add zero fp8 test case by ver217
[test] add check by hxwang
[test] fix test: test_zero1_2 by hxwang
[test] add mixtral modelling test by botbw
[test] pass mixtral shardformer test by botbw
[test] mixtra pp shard test by hxwang
[test] add mixtral transformer test by hxwang
[test] add mixtral for sequence classification by hxwang

Lora

[lora] lora support hybrid parallel plugin (#5956) by Wang Binluo

Feat

[feat] Dist Loader for Eval (#5950) by Tong Li

Chore

[chore] remove redundant test case, print string & reduce test tokens by botbw
[chore] docstring by hxwang
[chore] change moe_pg_mesh to private by hxwang
[chore] solve moe ckpt test failure and some other arg pass failure by hxwang
[chore] minor fix after rebase by hxwang
[chore] minor fix by hxwang
[chore] arg pass & remove drop token by hxwang
[chore] trivial fix by botbw
[chore] manually revert unintended commit by botbw
[chore] handle non member group by hxwang

Moe

[moe] solve dp axis issue by botbw
[moe] remove force_overlap_comm flag and add warning instead by hxwang
Revert "[moe] implement submesh initialization" by hxwang
[moe] refactor mesh assignment by hxwang
[moe] deepseek moe sp support by haze188
[moe] remove ops by hxwang
[moe] full test for deepseek and mixtral (pp + sp to fix) by hxwang
[moe] finalize test (no pp) by hxwang
[moe] init moe plugin comm setting with sp by hxwang
[moe] clean legacy code by hxwang
[moe] test deepseek by hxwang
[moe] implement tp by botbw
[moe] add mixtral dp grad scaling when not all experts are activated by botbw...

Assets 2

31 Jul 02:06

github-actions

v0.4.2

09c5f72

Version v0.4.2 Release Today!

What's Changed

Release

[release] update version (#5952) by Hongxin Liu

Zero

[zero] hotfix update master params (#5951) by Hongxin Liu

Feat

[Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895) by Runyu Lu

Shardformer

[shardformer] hotfix attn mask (#5947) by Hongxin Liu
[shardformer] hotfix attn mask (#5945) by Hongxin Liu

Chat

Merge pull request #5922 from hpcaitech/kto by YeAnbang

Feature

[Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (#5941) by zhurunhua

Hotfix

[Hotfix] Fix ZeRO typo #5936 by Edenzzzz

Fix bug

[FIX BUG] convert env param to int in (#5934) by Gao, Ruiyuan
[FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (#5931) by zhurunhua

Colossalchat

[ColossalChat] Hotfix for ColossalChat (#5910) by Tong Li

Examples

[Examples] Add lazy init to OPT and GPT examples (#5924) by Edenzzzz

Plugin

[plugin] support all-gather overlap for hybrid parallel (#5919) by Hongxin Liu

Full Changelog: v0.4.2...v0.4.1

Assets 2

17 Jul 09:30

github-actions

v0.4.1

73494de

Version v0.4.1 Release Today!

What's Changed

Release

[release] update version (#5912) by Hongxin Liu

Misc

[misc] support torch2.3 (#5893) by Hongxin Liu

Compatibility

[compatibility] support torch 2.2 (#5875) by Guangyao Zhang

Chat

Merge pull request #5901 from hpcaitech/colossalchat by YeAnbang
Merge pull request #5850 from hpcaitech/rlhf_SimPO by YeAnbang

Shardformer

[ShardFormer] fix qwen2 sp (#5903) by Guangyao Zhang
[ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (#5897) by Guangyao Zhang
[shardformer] DeepseekMoE support (#5871) by Haze188
[shardformer] fix the moe (#5883) by Wang Binluo
[Shardformer] change qwen2 modeling into gradient checkpointing style (#5874) by Jianghai
[shardformer]delete xformers (#5859) by flybird11111

Auto parallel

[Auto Parallel]: Speed up intra-op plan generation by 44% (#5446) by Stephan Kö

Zero

[zero] support all-gather overlap (#5898) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5878) by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5572) by pre-commit-ci[bot]

Feature

[Feature] Enable PP + SP for llama (#5868) by Edenzzzz

Hotfix

[HotFix] CI,import,requirements-test for #5838 (#5892) by Runyu Lu
[Hotfix] Fix OPT gradient checkpointing forward by Edenzzzz
[hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879) by Haze188

Feat

[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838) by Runyu Lu

Hoxfix

[Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap by Edenzzzz

Quant

[quant] fix bitsandbytes version check (#5882) by Hongxin Liu

Doc

[doc] Update llama + sp compatibility; fix dist optim table by Edenzzzz

Moe/zero

[MoE/ZeRO] Moe refactor with zero refactor (#5821) by Haze188

Full Changelog: v0.4.1...v0.4.0

Assets 2

28 Jun 02:51

github-actions

v0.4.0

eaea88c

Version v0.4.0 Release Today!

What's Changed

Release

[release] update version (#5864) by Hongxin Liu

Inference

[Inference]Lazy Init Support (#5785) by Runyu Lu

Shardformer

[shardformer] Support the T5ForTokenClassification model (#5816) by Guangyao Zhang

Zero

[zero] use bucket during allgather (#5860) by Hongxin Liu

Gemini

[gemini] fixes for benchmarking (#5847) by botbw
[gemini] fix missing return (#5845) by botbw

Feature

[Feature] optimize PP overlap (#5735) by Edenzzzz

Doc

[doc] add GPU cloud playground (#5851) by binmakeswell
[doc] fix open sora model weight link (#5848) by binmakeswell
[doc] opensora v1.2 news (#5846) by binmakeswell

Full Changelog: v0.4.0...v0.3.9

Assets 2

20 Jun 05:35

github-actions

v0.3.9

bd3e34f

Version v0.3.9 Release Today!

What's Changed

Release

[release] update version (#5833) by Hongxin Liu

Fix

[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837) by Yuanheng Zhao

Shardformer

[shardformer] Change atol in test command-r weight-check to pass pytest (#5835) by Guangyao Zhang
Merge pull request #5818 from GuangyaoZhang/command-r by Guangyao Zhang
[shardformer] upgrade transformers to 4.39.3 (#5815) by flybird11111
[shardformer] fix modeling of bloom and falcon (#5796) by Hongxin Liu
[shardformer] fix import (#5788) by Hongxin Liu

Devops

[devops] Remove building on PR when edited to avoid skip issue (#5836) by Guangyao Zhang
[devops] fix docker ci (#5780) by Hongxin Liu

Launch

[launch] Support IPv4 host initialization in launch (#5822) by Kai Lv

Misc

[misc] Add dist optim to doc sidebar (#5806) by Edenzzzz
[misc] update requirements (#5787) by Hongxin Liu
[misc] fix dist logger (#5782) by Hongxin Liu
[misc] Accelerate CI for zero and dist optim (#5758) by Edenzzzz
[misc] update dockerfile (#5776) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Gemini

[gemini] quick fix on possible async operation (#5803) by botbw
[Gemini] Use async stream to prefetch and h2d data moving (#5781) by Haze188
[gemini] optimize reduce scatter d2h copy (#5760) by botbw

Inference

[Inference] Fix flash-attn import and add model test (#5794) by Li Xingjian
[Inference]refactor baichuan (#5791) by Runyu Lu
Merge pull request #5771 from char-1ee/refactor/modeling by Li Xingjian
[Inference]Add Streaming LLM (#5745) by yuehuayingxueluo

Test

[test] fix qwen2 pytest distLarge (#5797) by Guangyao Zhang
[test] fix chatglm test kit (#5793) by Hongxin Liu
[test] Fix/fix testcase (#5770) by duanjunwen

Colossalchat

Merge pull request #5759 from hpcaitech/colossalchat_upgrade by YeAnbang

Install

[install]fix setup (#5786) by flybird11111

Hotfix

[hotfix] fix testcase in test_fx/test_tracer (#5779) by duanjunwen
[hotfix] fix llama flash attention forward (#5777) by flybird11111
[Hotfix] Add missing init file in inference.executor (#5774) by Yuanheng Zhao

Test/ci

[Test/CI] remove test cases to reduce CI duration (#5753) by botbw

Ci/tests

[CI/tests] simplify some test case to reduce testing time (#5755) by Haze188

Full Changelog: v0.3.9...v0.3.8

Assets 2

Releases: hpcaitech/ColossalAI

Version v0.4.8 Release Today!

What's Changed

Release

Doc

Application

Pre-commit.ci

Checkpointio

Hotfix

Shardformer

Ci

Issue template

Inference

Version v0.4.7 Release Today!

What's Changed

Release

Pre-commit.ci

Sharderformer

Checkpointio

News

Hotfix

Doc

Device

Fix

Optim

Zerobubble

Async io

Ckpt

Cli

Zero

Coati

Plugin

Version v0.4.6 Release Today!

What's Changed

Release

Pre-commit.ci

Checkpointio

Mcts

Doc

Extension

Hotfix

Version v0.4.5 Release Today!

What's Changed

Release

Misc

Fp8

Chore

Ckpt

Pipeline

Ring attention

Coati

Shardformer

Version v0.4.4 Release Today!

What's Changed

Release

Colossaleval

Moe

Sp

Doc

Fp8

Pre-commit.ci

Hotfix

Feature

Version v0.4.3 Release Today!

What's Changed

Release

Fp8

Hotfix

Colossalai/checkpoint_io/...

Colossal-llama

Plugin

Ci

Pre-commit.ci

Colossalchat

Misc

Feature

Chat

Test ci

Docs

Test