Releases: hpcaitech/ColossalAI
Releases · hpcaitech/ColossalAI
Version v0.4.8 Release Today!
What's Changed
Release
- [release] update version (#6195) by Hongxin Liu
Doc
- [doc] DeepSeek V3/R1 news (#6199) by binmakeswell
Application
- [application] add lora sft example data (#6198) by Hongxin Liu
- [application] Update README (#6196) by Tong Li
- [application] add lora sft example (#6192) by Hongxin Liu
Pre-commit.ci
Checkpointio
- [checkpointio] fix for async io (#6189) by flybird11111
- [checkpointio] fix checkpoint for 3d (#6187) by flybird11111
- [checkpointio] gather tensor before unpad it if the tensor is both padded and distributed (#6168) by Lemon Qin
- [checkpointio] support load-pin overlap (#6177) by Hongxin Liu
Hotfix
- [hotfix] fix zero optim save (#6191) by Hongxin Liu
- [hotfix] fix hybrid checkpointio for sp+dp (#6184) by flybird11111
Shardformer
- [shardformer] support pipeline for deepseek v3 and optimize lora save (#6188) by Hongxin Liu
- [shardformer] support ep for deepseek v3 (#6185) by Hongxin Liu
Ci
- [CI] Cleanup Dist Optim tests with shared helper funcs (#6125) by Wenxuan Tan
Issue template
- [Issue template] Add checkbox asking for details to reproduce error (#6104) by Wenxuan Tan
Inference
- [Inference]Fix example in readme (#6178) by Guangyao Zhang
Full Changelog: v0.4.8...v0.4.7
Version v0.4.7 Release Today!
What's Changed
Release
- [release] update version (#6174) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] pre-commit autoupdate (#6113) by pre-commit-ci[bot]
Sharderformer
- [Sharderformer] Support zbv in Sharderformer Policy (#6150) by duanjunwen
Checkpointio
- [checkpointio] support non blocking pin load (#6172) by Hongxin Liu
- [checkpointio]support asyncio for 3d (#6152) by flybird11111
- [checkpointio] fix async io (#6155) by flybird11111
- [checkpointio] support debug log (#6153) by Hongxin Liu
- [checkpointio] fix zero optimizer async save memory (#6151) by Hongxin Liu
- Merge pull request #6149 from ver217/hotfix/ckpt by Wang Binluo
- [checkpointio] disable buffering by ver217
- [checkpointio] fix pinned state dict by ver217
- [checkpointio] fix size compute by ver217
- [checkpointio] fix performance issue (#6139) by Hongxin Liu
- [checkpointio] support async model save (#6131) by Hongxin Liu
News
- [news] release colossalai for sora (#6166) by binmakeswell
Hotfix
- [hotfix] improve compatibility (#6165) by Hongxin Liu
- [Hotfix] hotfix normalization (#6163) by duanjunwen
- [hotfix] fix zero comm buffer init (#6154) by Hongxin Liu
- [hotfix] fix flash attn window_size err (#6132) by duanjunwen
Doc
- [doc] add bonus event (#6164) by binmakeswell
- [doc] update cloud link (#6148) by Sze-qq
- [doc] add hpc cloud intro (#6147) by Sze-qq
Device
- [Device]Support npu (#6159) by flybird11111
Fix
- [fix] fix bug caused by perf version (#6156) by duanjunwen
- [fix] multi-node backward slowdown (#6134) by Hanks
Optim
- [optim] hotfix adam load (#6146) by Hongxin Liu
Zerobubble
- [Zerobubble] merge main. (#6142) by duanjunwen
Async io
- [async io]supoort async io (#6137) by flybird11111
Ckpt
- [ckpt] Add async ckpt api (#6136) by Wang Binluo
Cli
- [cli] support run as module option (#6135) by Hongxin Liu
Zero
- [zero] support extra dp (#6123) by Hongxin Liu
Coati
Plugin
- [plugin] support get_grad_norm (#6115) by Hongxin Liu
Full Changelog: v0.4.7...v0.4.6
Version v0.4.6 Release Today!
What's Changed
Release
- [release] update version (#6109) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] pre-commit autoupdate (#6078) by pre-commit-ci[bot]
Checkpointio
- [checkpointio] fix hybrid plugin model save (#6106) by Hongxin Liu
Mcts
Doc
- [doc] sora solution news (#6100) by binmakeswell
Extension
- [extension] hotfix compile check (#6099) by Hongxin Liu
Hotfix
Full Changelog: v0.4.6...v0.4.5
Version v0.4.5 Release Today!
What's Changed
Release
- [release] update version (#6094) by Hongxin Liu
Misc
- [misc] fit torch api upgradation and remove legecy import (#6093) by Hongxin Liu
Fp8
- [fp8] add fallback and make compile option configurable (#6092) by Hongxin Liu
Chore
- [chore] refactor by botbw
Ckpt
- [ckpt] add safetensors util by botbw
Pipeline
- [pipeline] hotfix backward for multiple outputs (#6090) by Hongxin Liu
Ring attention
- [Ring Attention] Improve comments (#6085) by Wenxuan Tan
- Merge pull request #6071 from wangbluo/ring_attention by Wang Binluo
Coati
Shardformer
- [shardformer] optimize seq parallelism (#6086) by Hongxin Liu
- [shardformer] fix linear 1d row and support uneven splits for fused qkv linear (#6084) by Hongxin Liu
Full Changelog: v0.4.5...v0.4.4
Version v0.4.4 Release Today!
What's Changed
Release
- [release] update version (#6062) by Hongxin Liu
Colossaleval
- [ColossalEval] support for vllm (#6056) by Camille Zhong
Moe
Sp
- Merge pull request #6064 from wangbluo/fix_attn by Wang Binluo
- Merge pull request #6061 from wangbluo/sp_fix by Wang Binluo
Doc
- [doc] FP8 training and communication document (#6050) by Guangyao Zhang
- [doc] update sp doc (#6055) by flybird11111
Fp8
- [fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059) by Guangyao Zhang
- [fp8] fix missing fp8_comm flag in mixtral (#6057) by botbw
- [fp8] hotfix backward hook (#6053) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Hotfix
Feature
- [Feature] Split cross-entropy computation in SP (#5959) by Wenxuan Tan
Full Changelog: v0.4.4...v0.4.3
Version v0.4.3 Release Today!
What's Changed
Release
- [release] update version (#6041) by Hongxin Liu
Fp8
- [fp8] disable all_to_all_fp8 in intranode (#6045) by Hanks
- [fp8] fix linear hook (#6046) by Hongxin Liu
- [fp8] optimize all-gather (#6043) by Hongxin Liu
- [FP8] unsqueeze scale to make it compatible with torch.compile (#6040) by Guangyao Zhang
- Merge pull request #6012 from hpcaitech/feature/fp8_comm by Hongxin Liu
- Merge pull request #6033 from wangbluo/fix by Wang Binluo
- Merge pull request #6024 from wangbluo/fix_merge by Wang Binluo
- Merge pull request #6023 from wangbluo/fp8_merge by Wang Binluo
- [fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016) by Wang Binluo
- [fp8] zero support fp8 linear. (#6006) by flybird11111
- [fp8] add use_fp8 option for MoeHybridParallelPlugin (#6009) by Wang Binluo
- [fp8]update reduce-scatter test (#6002) by flybird11111
- [fp8] linear perf enhancement by botbw
- [fp8] update torch.compile for linear_fp8 to >= 2.4.0 (#6004) by botbw
- [fp8] support asynchronous FP8 communication (#5997) by flybird11111
- [fp8] refactor fp8 linear with compile (#5993) by Hongxin Liu
- [fp8] support hybrid parallel plugin (#5982) by Wang Binluo
- [fp8]Moe support fp8 communication (#5977) by flybird11111
- [fp8] use torch compile (torch >= 2.3.0) (#5979) by botbw
- [fp8] support gemini plugin (#5978) by Hongxin Liu
- [fp8] support fp8 amp for hybrid parallel plugin (#5975) by Hongxin Liu
- [fp8] add fp8 linear (#5967) by Hongxin Liu
- [fp8]support all2all fp8 (#5953) by flybird11111
- [FP8] rebase main (#5963) by flybird11111
- Merge pull request #5961 from ver217/feature/zeor-fp8 by Hanks
- [fp8] add fp8 comm for low level zero by ver217
Hotfix
- [Hotfix] Remove deprecated install (#6042) by Tong Li
- [Hotfix] Fix llama fwd replacement bug (#6031) by Wenxuan Tan
- [Hotfix] Avoid fused RMSnorm import error without apex (#5985) by Edenzzzz
- [Hotfix] README link (#5966) by Tong Li
- [hotfix] Remove unused plan section (#5957) by Tong Li
Colossalai/checkpoint_io/...
- [colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020) by Gao, Ruiyuan
Colossal-llama
Plugin
- [plugin] hotfix zero plugin (#6036) by Hongxin Liu
- [plugin] add cast inputs option for zero (#6003) (#6022) by Hongxin Liu
- [plugin] add cast inputs option for zero (#6003) by Hongxin Liu
Ci
- [CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018) by Wenxuan Tan
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5995) by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Colossalchat
Misc
- [misc] Use dist logger in plugins (#6011) by Edenzzzz
- [misc] update compatibility (#6008) by Hongxin Liu
- [misc] Bypass the huggingface bug to solve the mask mismatch problem (#5991) by Haze188
- [misc] remove useless condition by haze188
- [misc] fix ci failure: change default value to false in moe plugin by haze188
- [misc] remove incompatible test config by haze188
- [misc] remove debug/print code by haze188
- [misc] skip redunant test by haze188
- [misc] solve booster hang by rename the variable by haze188
Feature
- [Feature] Zigzag Ring attention (#5905) by Edenzzzz
- [Feature]: support FP8 communication in DDP, FSDP, Gemini (#5928) by Hanks
- [Feature] llama shardformer fp8 support (#5938) by Guangyao Zhang
- [Feature] MoE Ulysses Support (#5918) by Haze188
Chat
- [Chat] fix readme (#5989) by YeAnbang
- Merge pull request #5962 from hpcaitech/colossalchat by YeAnbang
- [Chat] Fix lora (#5946) by YeAnbang
Test ci
- [test ci]Feature/fp8 comm (#5981) by flybird11111
Docs
- [Docs] clarify launch port by Edenzzzz
Test
- [test] add zero fp8 test case by ver217
- [test] add check by hxwang
- [test] fix test: test_zero1_2 by hxwang
- [test] add mixtral modelling test by botbw
- [test] pass mixtral shardformer test by botbw
- [test] mixtra pp shard test by hxwang
- [test] add mixtral transformer test by hxwang
- [test] add mixtral for sequence classification by hxwang
Lora
- [lora] lora support hybrid parallel plugin (#5956) by Wang Binluo
Feat
Chore
- [chore] remove redundant test case, print string & reduce test tokens by botbw
- [chore] docstring by hxwang
- [chore] change moe_pg_mesh to private by hxwang
- [chore] solve moe ckpt test failure and some other arg pass failure by hxwang
- [chore] minor fix after rebase by hxwang
- [chore] minor fix by hxwang
- [chore] arg pass & remove drop token by hxwang
- [chore] trivial fix by botbw
- [chore] manually revert unintended commit by botbw
- [chore] handle non member group by hxwang
Moe
- [moe] solve dp axis issue by botbw
- [moe] remove force_overlap_comm flag and add warning instead by hxwang
- Revert "[moe] implement submesh initialization" by hxwang
- [moe] refactor mesh assignment by hxwang
- [moe] deepseek moe sp support by haze188
- [moe] remove ops by hxwang
- [moe] full test for deepseek and mixtral (pp + sp to fix) by hxwang
- [moe] finalize test (no pp) by hxwang
- [moe] init moe plugin comm setting with sp by hxwang
- [moe] clean legacy code by hxwang
- [moe] test deepseek by hxwang
- [moe] implement tp by botbw
- [moe] add mixtral dp grad scaling when not all experts are activated by botbw...
Version v0.4.2 Release Today!
What's Changed
Release
- [release] update version (#5952) by Hongxin Liu
Zero
- [zero] hotfix update master params (#5951) by Hongxin Liu
Feat
Shardformer
- [shardformer] hotfix attn mask (#5947) by Hongxin Liu
- [shardformer] hotfix attn mask (#5945) by Hongxin Liu
Chat
Feature
- [Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (#5941) by zhurunhua
Hotfix
Fix bug
- [FIX BUG] convert env param to int in (#5934) by Gao, Ruiyuan
- [FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (#5931) by zhurunhua
Colossalchat
Examples
Plugin
- [plugin] support all-gather overlap for hybrid parallel (#5919) by Hongxin Liu
Full Changelog: v0.4.2...v0.4.1
Version v0.4.1 Release Today!
What's Changed
Release
- [release] update version (#5912) by Hongxin Liu
Misc
- [misc] support torch2.3 (#5893) by Hongxin Liu
Compatibility
- [compatibility] support torch 2.2 (#5875) by Guangyao Zhang
Chat
- Merge pull request #5901 from hpcaitech/colossalchat by YeAnbang
- Merge pull request #5850 from hpcaitech/rlhf_SimPO by YeAnbang
Shardformer
- [ShardFormer] fix qwen2 sp (#5903) by Guangyao Zhang
- [ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (#5897) by Guangyao Zhang
- [shardformer] DeepseekMoE support (#5871) by Haze188
- [shardformer] fix the moe (#5883) by Wang Binluo
- [Shardformer] change qwen2 modeling into gradient checkpointing style (#5874) by Jianghai
- [shardformer]delete xformers (#5859) by flybird11111
Auto parallel
- [Auto Parallel]: Speed up intra-op plan generation by 44% (#5446) by Stephan Kö
Zero
- [zero] support all-gather overlap (#5898) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5878) by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5572) by pre-commit-ci[bot]
Feature
Hotfix
- [HotFix] CI,import,requirements-test for #5838 (#5892) by Runyu Lu
- [Hotfix] Fix OPT gradient checkpointing forward by Edenzzzz
- [hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879) by Haze188
Feat
Hoxfix
- [Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap by Edenzzzz
Quant
- [quant] fix bitsandbytes version check (#5882) by Hongxin Liu
Doc
- [doc] Update llama + sp compatibility; fix dist optim table by Edenzzzz
Moe/zero
Full Changelog: v0.4.1...v0.4.0
Version v0.4.0 Release Today!
What's Changed
Release
- [release] update version (#5864) by Hongxin Liu
Inference
Shardformer
- [shardformer] Support the T5ForTokenClassification model (#5816) by Guangyao Zhang
Zero
- [zero] use bucket during allgather (#5860) by Hongxin Liu
Gemini
Feature
Doc
- [doc] add GPU cloud playground (#5851) by binmakeswell
- [doc] fix open sora model weight link (#5848) by binmakeswell
- [doc] opensora v1.2 news (#5846) by binmakeswell
Full Changelog: v0.4.0...v0.3.9
Version v0.3.9 Release Today!
What's Changed
Release
- [release] update version (#5833) by Hongxin Liu
Fix
- [Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837) by Yuanheng Zhao
Shardformer
- [shardformer] Change atol in test command-r weight-check to pass pytest (#5835) by Guangyao Zhang
- Merge pull request #5818 from GuangyaoZhang/command-r by Guangyao Zhang
- [shardformer] upgrade transformers to 4.39.3 (#5815) by flybird11111
- [shardformer] fix modeling of bloom and falcon (#5796) by Hongxin Liu
- [shardformer] fix import (#5788) by Hongxin Liu
Devops
- [devops] Remove building on PR when edited to avoid skip issue (#5836) by Guangyao Zhang
- [devops] fix docker ci (#5780) by Hongxin Liu
Launch
Misc
- [misc] Add dist optim to doc sidebar (#5806) by Edenzzzz
- [misc] update requirements (#5787) by Hongxin Liu
- [misc] fix dist logger (#5782) by Hongxin Liu
- [misc] Accelerate CI for zero and dist optim (#5758) by Edenzzzz
- [misc] update dockerfile (#5776) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Gemini
- [gemini] quick fix on possible async operation (#5803) by botbw
- [Gemini] Use async stream to prefetch and h2d data moving (#5781) by Haze188
- [gemini] optimize reduce scatter d2h copy (#5760) by botbw
Inference
- [Inference] Fix flash-attn import and add model test (#5794) by Li Xingjian
- [Inference]refactor baichuan (#5791) by Runyu Lu
- Merge pull request #5771 from char-1ee/refactor/modeling by Li Xingjian
- [Inference]Add Streaming LLM (#5745) by yuehuayingxueluo
Test
- [test] fix qwen2 pytest distLarge (#5797) by Guangyao Zhang
- [test] fix chatglm test kit (#5793) by Hongxin Liu
- [test] Fix/fix testcase (#5770) by duanjunwen
Colossalchat
Install
- [install]fix setup (#5786) by flybird11111
Hotfix
- [hotfix] fix testcase in test_fx/test_tracer (#5779) by duanjunwen
- [hotfix] fix llama flash attention forward (#5777) by flybird11111
- [Hotfix] Add missing init file in inference.executor (#5774) by Yuanheng Zhao
Test/ci
Ci/tests
Full Changelog: v0.3.9...v0.3.8