Skip to content

v0.1.0b7

Compare
Choose a tag to compare
@mfuntowicz mfuntowicz released this 24 May 11:24
· 31 commits to main since this release
d19ce46

Highlights

  • Mixtral models are now supported (requires a multi-gpu setup)
  • Tensor Parallelism & Pipeline Parallelism are supported on from_pretrained and pipeline through the use of tp=<int>, pp=<int>
  • Models from transformers are now loaded in their respective checkpoint data type rather than float32 avoiding most of memory errors that were happening in 0.1.0b6
  • Intermediate TensorRT-LLM checkpoints and engines are now saved in two different folders (checkpoints/ and engines/) to avoid issues when building multiple checkpoints with the same config.json (TP / PP setup)

What's Changed

New Contributors

Full Changelog: v0.1.0b6...v0.1.0b7