15 Oct 06:23

irenedea

18b0a6d

v0.13.0 Latest

Latest

🚀 LLM Foundry v0.13.0

🛠️ Bug Fixes & Cleanup

Pytorch 2.4 Checkpointing (#1569, #1581, #1583)

Resolved issues related to checkpointing for Curriculum Learning (CL) callbacks.

🔧 Dependency Updates

Bumped tiktoken from 0.4.0 to 0.8.0 (#1572)
Updated onnxruntime from 1.19.0 to 1.19.2 (#1590)

What's Changed

Update mcli yamls by @dakinggg in #1552
Use allenai/c4 instead of c4 dataset by @eitanturok in #1554
Tensor Parallelism by @eitanturok in #1521
Insufficient Permissions Error when trying to access table by @KuuCi in #1555
Add NoOp optimizer by @snarayan21 in #1560
Deterministic GCRP Errors by @KuuCi in #1559
Simplify CL API by @b-chu in #1510
Reapply #1389 by @dakinggg in #1561
Add dataset swap callback by @b-chu in #1536
Add error to catch more unknown example types by @milocress in #1562
Add FileExtensionNotFoundError by @b-chu in #1564
Add InvalidConversationError by @b-chu in #1565
Release docker img by @KuuCi in #1547
Revert FT dataloader changes from #1561, keep #1564 by @snarayan21 in #1566
Cleanup TP by @eitanturok in #1556
Changes for dataset swap callback by @gupta-abhay in #1569
Do not consider run_name when auto-detecting autoresume by @irenedea in #1571
Allow parameters with requires_grad=False in meta init by @sashaDoubov in #1567
Bump tiktoken from 0.4.0 to 0.8.0 by @dependabot in #1572
Add extensions to FinetuningFileNotFoundError by @b-chu in #1578
Handle long file names in convert text to mds by @irenedea in #1579
Set streaming log level by @mvpatel2000 in #1582
Fix pytorch checkpointing for CL callback by @b-chu in #1581
Fix pytorch checkpointing for CL callback by @b-chu in #1583
Error if filtered dataset contains 0 examples by @irenedea in #1585
Change cluster errors from NetworkError to UserError by @irenedea in #1586
Do not autoresume if a default name is set, only on user defined ones by @irenedea in #1588
Bump onnxruntime from 1.19.0 to 1.19.2 by @dependabot in #1590
Make FinetuningStreamingDataset parameters more flexible by @XiaohanZhangCMU in #1580
Add build callback tests by @irenedea in #1577
Bump version to 0.14.0.dev0 by @irenedea in #1587
Fix typo in eval code by using 'fsdp' instead of 'fsdp_config' by @irenedea in #1593

Full Changelog: v0.12.0...v0.13.0

Contributors

sashaDoubov, gupta-abhay, and 10 other contributors

Assets 2

26 Sep 03:52

dakinggg

v0.12.0

7897fb7

v0.12.0

🚀 LLM Foundry v0.12.0

New Features

PyTorch 2.4 (#1505)

This release updates LLM Foundry to the PyTorch 2.4 release, bringing with it support for the new features and optimizations in PyTorch 2.4

Extensibility improvements (#1450, #1449, #1468, #1467, #1478, #1493, #1495, #1511, #1512, #1527)

Numerous improvements to the extensibility of the modeling and data loading code, enabling easier reuse for subclassing and extending. Please see the linked PRs for more details on each change.

Improved error messages (#1457, #1459, #1519, #1518, #1522, #1534, #1548, #1551)

Various improved error messages, making debugging user errors more clear.

Sliding window in torch attention (#1455)

We've added support for sliding window attention to the reference attention implementation, allowing easier testing and comparison against more optimized attention variants.

Bug fixes

Extra BOS token for llama 3.1 with completion data (#1476)

A bug resulted in an extra BOS token being added between prompt and response during finetuning. This is fixed so that the prompt and response supplied by the user are concatenated without any extra tokens put between them.

What's Changed

Add test for logged_config transforms by @b-chu in #1441
Bump version to 0.12.0.dev0. by @irenedea in #1447
Update pytest-codeblocks requirement from <0.17,>=0.16.1 to >=0.16.1,<0.18 by @dependabot in #1445
Bump coverage[toml] from 7.4.4 to 7.6.1 by @dependabot in #1442
Enabled generalizing build_inner_model in ComposerHFCausalLM by @gupta-abhay in #1450
Update llm foundry version in mcli yamls by @irenedea in #1451
merge to main by @XiaohanZhangCMU in #865
allow embedding resizing passed through by @jdchang1 in #1449
Update packaging requirement from <23,>=21 to >=21,<25 by @dependabot in #1444
Update pytest requirement from <8,>=7.2.1 to >=7.2.1,<9 by @dependabot in #1443
Implement ruff rules enforcing PEP 585 by @snarayan21 in #1453
Adding sliding window attn to scaled_multihead_dot_product_attention by @ShashankMosaicML in #1455
Add user error for UnicodeDeocdeError in convert text to mds by @irenedea in #1457
Fix log_config by @josejg in #1432
Add EnvironmentLogger Callback by @josejg in #1350
Update mosaicml/ci-testing to 0.1.2 by @irenedea in #1458
Correct error message for inference wrapper by @josejg in #1459
Update CI tests to v0.1.2 by @KuuCi in #1466
Bump onnxruntime from 1.18.1 to 1.19.0 by @dependabot in #1461
Update tenacity requirement from <9,>=8.2.3 to >=8.2.3,<10 by @dependabot in #1460
Simple change to enable mapping functions for ft constructor by @gupta-abhay in #1468
use default eval interval from composer by @milocress in #1369
Consistent Naming EnviromentLoggingCallback by @josejg in #1470
Register NaN Monitor Callback by @josejg in #1471
Add train subset num batches by @mvpatel2000 in #1472
Parent class hf models by @jdchang1 in #1467
Remove extra bos for prompt/response data with llama3.1 by @dakinggg in #1476
Add prepare fsdp back by @dakinggg in #1477
Add date_string when applying tokenizer chat template by @snarayan21 in #1474
Make sample tokenization extensible by @gupta-abhay in #1478
Use Streaming version 0.8.1 by @snarayan21 in #1479
Bump hf-transfer from 0.1.3 to 0.1.8 by @dependabot in #1480
fix hf checkpointer by @milocress in #1489
Fix device mismatch when running hf.generate by @ShashankMosaicML in #1486
Bump composer to 0.24.1 + FSDP config device_mesh deprecation by @snarayan21 in #1487
master_weights_dtype not supported by ComposerHFCausalLM.init() by @eldarkurtic in #1485
Detect loss spikes and high losses during training by @joyce-chen-uni in #1473
Enable passing in external position ids by @gupta-abhay in #1493
Align logged attributes for errors and run metadata in kill_loss_spike_callback.py by @joyce-chen-uni in #1494
tokenizer is never built when converting finetuning dataset by @eldarkurtic in #1496
Removing error message for reusing kv cache with torch attn by @ShashankMosaicML in #1497
Fix formatting of loss spike & high loss error messages by @joyce-chen-uni in #1498
Enable cross attention layers by @gupta-abhay in #1495
Update to ci-testing 0.2.0 by @dakinggg in #1500
[WIP] Torch 2.4 in docker images by @snarayan21 in #1491
[WIP] Only torch 2.4.0 compatible by @snarayan21 in #1505
Update mlflow requirement from <2.16,>=2.14.1 to >=2.14.1,<2.17 by @dependabot in #1506
Update ci-testing to 0.2.2 by @dakinggg in #1503
Allow passing key_value_statest for x-attn through MPT Block by @gupta-abhay in #1511
Fix cross attention for blocks by @gupta-abhay in #1512
Put 2.3 image back in release examples by @dakinggg in #1513
Sort callbacks so that CheckpointSaver goes before HuggingFaceCheckpointer by @irenedea in #1515
Raise MisconfiguredDatasetError from original error by @irenedea in #1519
Peft fsdp by @dakinggg in #1520
Raise DatasetTooSmall exception if canonical nodes is less than num samples by @irenedea in #1518
Add permissions check for delta table reading by @irenedea in #1522
Add HuggingFaceCheckpointer option for only registering final checkpoint by @irenedea in #1516
Replace FSDP args by @KuuCi in #1517
enable correct padding_idx for embedding layers by @gupta-abhay in #1527
Revert "Replace FSDP args" by @KuuCi in #1533
Delete unneeded inner base model in PEFT HF Checkpointer by @snarayan21 in #1532
Add deprecation warning to fsdp_config by @KuuCi in #1530
Fix reuse kv cache for torch attention by @ShashankMosaicML in #1539
Error on text dataset file not found by @milocress in #1534
Make ICL tasks not required for eval by @snarayan21 in #1540
Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. by @ShashankMosaicML in #1374
Register mosaic logger by @dakinggg in #1542
Hfcheckpointer optional generation config by @KuuCi in #1543
Bump composer version to 0.25.0 by @dakinggg in #1546
Bump streaming version to 0.9.0 by @dakinggg in #1550
Bump version to 0.13.0.dev0 by @dakinggg in #1549
Add proper user error for accessing schema by @KuuCi in #1548
Validate Cluster Access Mode by @KuuCi in #1551

New Contributors

@jdchang1 made their first contribution in #1449
@joyce-chen-uni made their first contribution in #1473

Full Changelog: v0.11.0...v0.12.0

Contributors

gupta-abhay, eldarkurtic, and 13 other contributors

Assets 2

13 Aug 17:16

irenedea

v0.11.0

d40d016

v0.11.0

🚀 LLM Foundry v0.11.0

New Features

LLM Foundry CLI Commands (#1337, #1345, #1348, #1354)

We've added CLI commands for our commonly used scripts.

For example, instead of calling composer llm-foundry/scripts/train.py parameters.yaml, you can now do composer -c llm-foundry train parameters.yaml.

Docker Images Contain All Optional Dependencies (#1431)

LLM Foundry Docker images now have all optional dependencies.

Support for Llama3 Rope Scaling (#1391)

To use it, you can add the following to your parameters:

model:
    name: mpt_causal_lm
    attn_config:
      rope: true
      ...
      rope_impl: hf
      rope_theta: 500000
      rope_hf_config:
        type: llama3
        ...

Tokenizer Registry (#1386)

We now have a tokenizer registry so you can easily add custom tokenizers.

LoadPlanner and SavePlanner Registries (#1358)

We now have LoadPlanner and SavePlanner registries so you can easily add custom checkpoint loading and saving logic.

Faster Auto-packing (#1435)

The auto packing startup is now much faster. To use auto packing with finetuning datasets, you can add packing_ratio: auto to your config like so:

  train_loader:
    name: finetuning
    dataset:
      ...
      packing_ratio: auto

What's Changed

Extra serverless by @XiaohanZhangCMU in #1320
Fixing sequence_id =-1 bug, adding tests by @ShashankMosaicML in #1324
Registry docs update by @dakinggg in #1323
Add dependabot by @dakinggg in #1322
HUGGING_FACE_HUB_TOKEN -> HF_TOKEN by @dakinggg in #1321
Bump version by @b-chu in #1326
Relax hf hub pin by @dakinggg in #1314
Error if metadata matches existing keys by @dakinggg in #1313
Update transformers requirement from <4.41,>=4.40 to >=4.42.3,<4.43 by @dependabot in #1327
Bump einops from 0.7.0 to 0.8.0 by @dependabot in #1328
Bump onnxruntime from 1.15.1 to 1.18.1 by @dependabot in #1329
Bump onnx from 1.14.0 to 1.16.1 by @dependabot in #1331
Currently multi-gpu generate does not work with hf.generate for hf checkpoints. This PR fixes that. by @ShashankMosaicML in #1332
Fix registry for callbacks with configs by @mvpatel2000 in #1333
Adding a child class of hf's rotary embedding to make hf generate work on multiple gpus. by @ShashankMosaicML in #1334
Add a config arg to just save an hf checkpoint by @dakinggg in #1335
Deepcopy config in callbacks_with_config by @mvpatel2000 in #1336
Avoid HF race condition by @dakinggg in #1338
Nicer error message for undefined symbol by @dakinggg in #1339
Bump sentencepiece from 0.1.97 to 0.2.0 by @dependabot in #1342
Removing logging exception through update run metadata by @jjanezhang in #1292
[MCLOUD-4910] Escape UC names during data prep by @naren-loganathan in #1343
Add CLI for train.py by @KuuCi in #1337
Add fp32 to the set of valid inputs to attention layer by @j316chuck in #1347
Log all extraneous_keys in one go for ease of development by @josejg in #1344
Fix MLFlow Save Model for TE by @j316chuck in #1353
Add flag for saving only composer checkpoint by @irenedea in #1356
Expose flag for should_save_peft_only by @irenedea in #1357
Command utils + train by @KuuCi in #1361
Readd Clear Resolver by @KuuCi in #1365
Add Eval to Foundry CLI by @KuuCi in #1345
Enhanced Logging for convert_delta_to_json and convert_text_to_mds by @vanshcsingh in #1366
Add convert_dataset_hf to CLI by @KuuCi in #1348
Add missing init by @KuuCi in #1368
Make ICL dataloaders build lazily by @josejg in #1359
Add option to unfuse Wqkv by @snarayan21 in #1367
Add convert_dataset_json to CLI by @KuuCi in #1349
Add convert_text_to_mds to CLI by @KuuCi in #1352
Fix hf dataset hang on small dataset by @dakinggg in #1370
Add LoadPlanner and SavePlanner registries by @irenedea in #1358
Load config on rank 0 first by @dakinggg in #1371
Add convert_finetuning_dataset to CLI by @KuuCi in #1354
Allow for transforms on the model before MLFlow registration by @snarayan21 in #1372
Allow flash attention up to 3 by @dakinggg in #1377
Update accelerate requirement from <0.26,>=0.25 to >=0.32.1,<0.33 by @dependabot in #1341
update runners by @KevDevSha in #1360
Allow for multiple workers when autopacking by @b-chu in #1375
Allow train.py-like config for eval.py by @josejg in #1351
Fix load and save planner config logic by @irenedea in #1385
Do dtype conversion in torch hook to save memory by @irenedea in #1384
Get a shared file system safe signal file name by @dakinggg in #1381
Add transformation method to hf_causal_lm by @irenedea in #1383
[kushalkodnad/tokenizer-registry] Introduce new registry for tokenizers by @kushalkodn-db in #1386
Bump transformers version to 4.43.1 by @dakinggg in #1388
Add convert_delta_to_json to CLI by @KuuCi in #1355
Revert "Use utils to get shared fs safe signal file name (#1381)" by @dakinggg in #1389
Avoid race condition in convert text to mds script by @dakinggg in #1390
Refactor loss function for ComposerMPTCausalLM by @irenedea in #1387
Revert "Allow for multiple workers when autopacking (#1375)" by @dakinggg in #1392
Bump transformers to 4.43.2 by @dakinggg in #1393
Support rope scaling by @milocress in #1391
Removing the extra LlamaRotaryEmbedding import by @ShashankMosaicML in #1394
Dtensor oom by @dakinggg in #1395
Condition the meta initialization for hf_causal_lm on pretrain by @irenedea in #1397
Fix license link in readme by @dakinggg in #1398
Enable passing epsilon when building norm layers by @gupta-abhay in #1399
Add pre register method for mlflow by @dakinggg in #1396
add it by @dakinggg in #1400
Remove orig params default by @dakinggg in #1401
Add spin_dataloaders flag by @dakinggg in #1405
Remove curriculum learning error when duration less than saved timestamp by @b-chu in #1406
Set pretrained model name correctly, if provided, in HF Checkpointer by @snarayan21 in #1407
Enable QuickGelu Function for CLIP models by @gupta-abhay in #1408
Bump streaming version to v0.8.0 by @mvpatel2000 in #1411
Kevin/ghcr build by @KevDevSha in #1413
Update accelerate requirement from <0.33,>=0.25 to >=0.25,<0.34 by @dependabot in #1403
Update huggingface-hub requirement from <0.24,>=0.19.0 to >=0.19.0,<0.25 by @dependabot in #1379
Make Pytest log in color in Github Action by @eitanturok in https://github.com/mosaicml/llm-fo...

Contributors

bfontain, gupta-abhay, and 18 other contributors

Assets 2

02 Jul 13:31

b-chu

v0.10.0

742f340

v0.10.0

🚀 LLM Foundry v0.10.0

New Features

Registry for ICL datasets (#1252)

ICL datasets have now been added as a registry.

Curriculum Learning Callback (#1256)

You can now switch dataloaders while training which enables curriculum learning.

train_loader:
  <dataloader parameters>
callback:
  curriculum_learning:
  - duration: <number>tok
    train_loader:  # matches top level train_loader
      <dataloader parameters>
  - duration: <number>tok
    train_loader:
      <dataloader parameters>
  - duration: <number>tok
    train_loader:
      <dataloader parameters>

[Experimental] Interweave Attention Layers (#1299)

You can now override default block configs for certain layers, allowing for different sliding window sizes, reusing the previous layer's kv cache, etc.

model:
    ...
    (usual model configs)
    ...
    block_overrides:
        order:
        - name: default
        - order:
          - name: sliding_window_layer
          - name: sliding_window_layer_reuse
          - name: sliding_window_layer
          - repeat: 2
            name: sliding_window_layer_reuse
          - name: reuse_kv_layer
          repeat: 2
        overrides:
            sliding_window_layer:
                attn_config:
                    sliding_window_size: 1024
            sliding_window_layer_reuse:
                attn_config:
                    sliding_window_size: 1024
                    reuse_kv_layer_idx: -1 # Relative index of the layer whose kv cache to reuse
            reuse_kv_layer:
                attn_config:
                    reuse_kv_layer_idx: -6 # Relative index of the layer whose kv cache to reuse

Bug fixes

Fix packing + streaming + resumption by @dakinggg in #1277

What's Changed

Bump Version to 0.10.0.dev0 by @KuuCi in #1255
Fix typo in setup.py by @XiaohanZhangCMU in #1263
Update TE Dockerfile by @j316chuck in #1265
Revert "Update TE Dockerfile (#1265)" by @j316chuck in #1266
Revert to older TE version by @mvpatel2000 in #1267
Bump Composer to version 0.23.2 by @dakinggg in #1269
fix linting by @milocress in #1270
Add torch 2.3.1 docker images by @dakinggg in #1275
Make expandable segments on by default by @b-chu in #1278
Adds CI for torch 2.3.1 by @dakinggg in #1281
Update README.md to use variables by @milocress in #1282
Add registry for ICL datasets by @sanjari-orb in #1252
Fix typo in CI by @dakinggg in #1284
Fix backwards compatibility for ICL arg by @dakinggg in #1286
Fix packing + streaming + resumption by @dakinggg in #1277
Dbfs HF by @KuuCi in #1214
Bump mlflow to 2.13.2 by @KuuCi in #1285
Add missing dependency group by @dakinggg in #1287
Update Dockerfile with TE main by @j316chuck in #1273
Fix TE HF checkpoint saving by @j316chuck in #1280
added systemMetricsMonitor callback by @JackZ-db in #1260
Extendability refactors by @dakinggg in #1290
Small refactor for update batch size by @dakinggg in #1293
Bump min composer version to 0.23.3 by @dakinggg in #1294
Fix grad accum typing by @dakinggg in #1296
Bump composer to 0.23.4 by @mvpatel2000 in #1297
Allow passing in lbl_process_group directly by @dakinggg in #1298
Add all transforms to train script by @dakinggg in #1300
Add Retries to run_query by @KuuCi in #1302
Bumping mlflow version to include buffering by @JackZ-db in #1303
Ignore mosaicml logger for exception if excephook is active by @jjanezhang in #1301
Add curriculum learning callback by @b-chu in #1256
Avoid circular import in hf checkpointer by @dakinggg in #1304
Remove codeql workflow by @dakinggg in #1305
Update CI test to v0.0.8 by @KuuCi in #1306
Upgrade ci testing to 0.0.8 by @dakinggg in #1307
Bump ci-testing to 0.0.9 by @dakinggg in #1310
Fix 4 gpu tests by @dakinggg in #1311
Bump recommended images to 2.3.1 and remove 2.3.0 CI by @dakinggg in #1312
Provide default seed value in TrainConfig, matching EvalConfig by @mvpatel2000 in #1315
Refactor hf checkpointer for config transformations by @irenedea in #1318
Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. by @ShashankMosaicML in #1299
Add optional logging of text output to EvalOutputLogging by @sjawhar in #1283

New Contributors

@sanjari-orb made their first contribution in #1252
@JackZ-db made their first contribution in #1260
@sjawhar made their first contribution in #1283

Full Changelog: v0.9.1...v0.10.0

Contributors

sjawhar, j316chuck, and 11 other contributors

Assets 2

24 Jun 23:00

dakinggg

v0.9.1

1402b7e

v0.9.1

🚀 LLM Foundry v0.9.1

This is a minor patch release to bump the minimum version of mlflow to make sure to buffer writes (mosaicml/composer#3401)

Whats changed

Bumping mlflow version to include buffering by @JackZ-db in #1303

Full Changelog: v0.9.0...v0.9.1

Contributors

JackZ-db

Assets 2

08 Jun 04:58

KuuCi

v0.9.0

e2d68f6

v0.9.0

🚀 LLM Foundry v0.9.0

New Features

More Token Encoding Types (#1254)

We've expanded the different ways to encode token IDs by allowing uint32 and uint16 formats, which saves significant space for datasets with smaller vocab sizes. We also extended ndarray type support for MDS dataset columns to the generic text dataset and updated conversion scripts accordingly.

Enforced Stricter Configs (#1254, #1225, #1202)

We've implemented stricter enforcement on our Train and Eval configs to further protect users from attempting to train with invalid configs. In conjunction with numerous other PRs, we have stronger error handling to help users use LLM Foundry smoothly.

Previously, this was allowed:

parameters:
   train_dataloader:
      ...
      seed: ${global_seed}
      random_other_key_that's_not_in_the_dataloader_constructor # this is not allowed
   ...
   global_seed: 17 # this is also not allowed

But we've added a variables section. Please do this instead:

parameters:
  variables:
    global_seed: 42
  ...
  train_dataloader:
    seed: ${variables.global_seed}

Chunked text to mds conversion (#1240)

We've updated our text to mds to convertion script to convert files to MDS in chunks. This protects us from loading entire large files at once (potentially causing OOMs), and drastically speeds up converting long sequences.

Breaking Changes and Deprecations

What's Changed

Bump version v0.9.0.dev0 by @milocress in #1181
structuredconfig for train.py and eval.py by @milocress in #1051
update version names by @milocress in #1185
Refactoring attention by @ShashankMosaicML in #1182
Checking if attention mask is present for ignoring pad tokens in ffn. by @ShashankMosaicML in #1188
Bump python 3.11 version in setup.py by @j316chuck in #1189
Docstring fix for curriculum learning callback by @snarayan21 in #1186
Set ft dataloader name explicitly by @milocress in #1187
Remove to_container by @dakinggg in #1190
fix eval by @milocress in #1193
Log exception on inactivity callback by @jjanezhang in #1194
Pass FC type along for all FFN types by @dakinggg in #1196
Streaming version bump to 0.7.6 by @snarayan21 in #1195
Clearer error message for unknown example type by @milocress in #1202
Added torch_dmoe defaults, bug fixes for 2D inputs by @snarayan21 in #1210
log eval dataset misconfiguration by @milocress in #1179
Using self.shift_labels instead of self.model.transformer.shift_label in the loss function. by @ShashankMosaicML in #1211
Add fc to HF export by @dakinggg in #1209
TransformerEngine Image Build by @mvpatel2000 in #1204
Removed debugging code in tests by @dakinggg in #1213
Make fc_type a dict to pass fc kwargs through by @snarayan21 in #1201
Fix dmoe tests GPU OOM by @snarayan21 in #1216
Update readme to clarify flash-attn and TE installs by @snarayan21 in #1219
Modularize components of megablocks layer builder by @dakinggg in #1224
Add user error superclass by @milocress in #1225
Make config/class properties on ComposerMPTForCausalLM by @dakinggg in #1227
Quick patch to check that Dataset Keys contain non-None Values by @KuuCi in #1228
Modularize backbone class and block creation by @dakinggg in #1229
Loss v len callback by @ShashankMosaicML in #1226
Fixing the state.timestamp.batch.value issue in loss v len callback by @ShashankMosaicML in #1232
Fix attr error for attention_classes when using act ckpt by @cli99 in #1230
Fix tuple typing by @dakinggg in #1235
Add example eval scripts for dbrx PT sizes by @aspfohl in #1218
Configurable submesh by @dakinggg in #1236
Add retries to downloads in convert_text_to_mds.py by @irenedea in #1238
Move MLFlow dataset outside of log_config by @KuuCi in #1234
add error when chat template fails by @milocress in #1222
Make the exceptions serializable by @dakinggg in #1239
Removing rich install by @jjanezhang in #1198
Chunk file reads and tokenization for text to mds conversion by @irenedea in #1240
Make HF conversion automatically add missing imports by @dakinggg in #1241
Add logging to convert_text_to_mds.py script by @irenedea in #1243
Update CODEOWNERS by @dakinggg in #1248
Replacing icl_task_type question_answering with generation_task_with_answers in long context eval yamls. by @ShashankMosaicML in #1250
Change TE docker image to enable te_shard_weight by @j316chuck in #1251
Fix MPT HF conversion by @dakinggg in #1257
Remove spurious warning by @dakinggg in #1258
Adding more token encoding types by @snarayan21 in #1254
Bump Composer to 0.23.0 by @KuuCi in #1259
Fix typo in setup.py by @XiaohanZhangCMU in #1263
Bump composer to 0.23.2 by @dakinggg in #1269

Full Changelog: v0.8.0...v0.9.0

Contributors

j316chuck, irenedea, and 10 other contributors

Assets 2

08 May 01:36

milocress

v0.8.0

57cdd2a

v0.8.0

🚀 LLM Foundry v0.8.0

New Features

Megablocks support (#1102)

Support for training optimized MoE models at large scale.

Check out the megablocks documentation for more information on building state of the art MoE models.

Expanded Registries (#1080, #1093, #1094, #1095, #1096, #1165)

We've expanded support for registries to include, dataloaders, FFN layers, attention layers, norms, and parameter initialization functions.

Check out the README for detailed instructions and code examples!

Support for ShareGPT chat format (#1098)

We now support the ShareGPT format for finetuning.

Breaking Changes and Deprecations

We have updated the minimum supported PyTorch version to torch 2.3 (#1152).

In Context Learning Code Evaluation (#1181)

We've removed the code_evaluation task from the allowed in context learning task types, and we've deleted the InContextLearningCodeEvaluationDataset and InContextLearningCodeEvalAccuracy classes.

Question-Answering

We've removed the question_answering task type. Please use the generation_task_with_answers task instead.

What's Changed

Update README by @hanlint in #1069
Expose more exception attributes by @jjanezhang in #1071
Output eval logging batch by @maxisawesome in #961
Add expandeable segments flag by @dakinggg in #1075
Check the user provided eos / bos token id against the tokenizer eos / bos token id by @ShashankMosaicML in #1039
Triton RMSNorm by @josejg in #1050
Fix tiktoken vocab size by @dakinggg in #1081
Doing the loss reduction in foundry instead of in the loss functions. by @ShashankMosaicML in #1079
Decrease log verbosity with no bias by @mvpatel2000 in #1082
Upgrade hf chat by @j316chuck in #1061
Fixes for streaming and auto packing by @dakinggg in #1083
Background mlflow model registration by @irenedea in #1078
Update README.md to include DBRX blog under "Latest News" by @lupesko in #1085
Decrease transformers file size for mlflow by @dakinggg in #1087
log packing ratio progress by @milocress in #1070
Bump HF version by @b-chu in #1091
Fix typo in expandable_segments by @mammothb in #1088
Bump transformers to 4.39.3 by @dakinggg in #1086
Fix yaml typo by @dakinggg in #1092
Fix for overriding nested configs by @dakinggg in #1089
cleaned up HF/MPT conversion test by @milocress in #1048
Update yamls for 0.7.0 by @dakinggg in #1097
Norms registry by @dakinggg in #1080
fixing evaluator microbatch size by @ShashankMosaicML in #1100
Updating the streaming version in setup.py by @ShashankMosaicML in #1103
MegaBlocks release by @mvpatel2000 in #1102
Remove torch compile from GLU by @josejg in #1101
Update config_moe_args.py by @vchiley in #1104
Add remote code option to allow execution of DBRX tokenizer by @b-chu in #1106
Fix overwriting FP8 act ckpt flag in the train script by @cli99 in #1107
Support ShareGPT chat format by @samhavens in #1098
FC layer registry by @dakinggg in #1093
Attention layer registry by @dakinggg in #1094
Dbrx finetune yaml requires save folder specified to enable autoresume by @mvpatel2000 in #1108
Revert "Update config_moe_args.py" by @vchiley in #1111
rm new_group todo by @vchiley in #1112
Migrate ICL classes to foundry by @bmosaicml in #936
FFN layer registry by @dakinggg in #1095
Param init registry by @dakinggg in #1096
Add missing init file by @dakinggg in #1113
Update tests to not rely on mistral by @dakinggg in #1117
Bump transformers to 4.40 by @dakinggg in #1118
add .json to SUPPORTED_EXTENSIONS by @eitanturok in #1114
Add option for subclasses to convert model and tokenizer in hf checkpointer by @dakinggg in #1121
Bump Composer to 0.21.3 by @b-chu in #1122
catch misconfigured hf dataset by @milocress in #1123
Pin mlflow by @dakinggg in #1124
Change main to a dev version by @dakinggg in #1126
Fix deprecation versions by @dakinggg in #1129
Clean up the publicly exported API by @dakinggg in #1128
Fix HF checkpointer + mlflow bugs by @dakinggg in #1125
Update JSONL sources in eval README by @emmanuel-ferdman in #1110
Mlflow datasets by @KuuCi in #1119
Strict key checking for dataset by @b-chu in #1131
First initialize dist with gloo by @dakinggg in #1133
Fix saving of generation_config for Llama-3 by @eldarkurtic in #1134
Bump datasets version by @dakinggg in #1138
Revert "First initialize dist with gloo (#1133)" by @dakinggg in #1139
Barrier immediately after initialize dist with logs by @dakinggg in #1140
Add new FT instructions by @b-chu in #1143
Upgrade ci-testing by @mvpatel2000 in #1145
Fix typos in callbacks with configs by @dakinggg in #1146
Remove olmo as a dependency by @snarayan21 in #1148
build inner model by @milocress in #1147
fix DatasetConstants.splints default value to protect dictionary overwriting by @ivan-kud in #1144
Bump flash attention version by @dakinggg in #1150
Torch 2.3 part 1 - build the images by @dakinggg in #1149
Torch 2.3 upgrade Part 2 - CI by @dakinggg in #1151
Comment out 2.3 tests by @dakinggg in #1155
Fix yaml lint by @dakinggg in #1156
Move sentencepiece import by @aspfohl in #1157
Bump composer version to 0.22.0 by @snarayan21 in #1160
Uncomment GPU tests by @milocress in #1162
Depend on coverage by @milocress in #1163
fix dep group in torch 2.3 ci by @dakinggg in #1164
Bump min torch version to 2.3.0 by @dakinggg in #1152
Add line splitting and other linting by @b-chu in #1161
refactoring dataloader into registries. by @ShashankMosaicML in #1165
Migrate eval output logging to foundry by @maxisawesome in #1166
Fix import and mocking by @dakinggg in #1169
minor fix to llmfoundry.data.utils.get_text_collator by @ShashankMosaicML in #1170
Fix config access for DBRX by @dakinggg in #1177

New Contributors

@lupesko made their first contribution in #1085
@mammothb made their first contribution in #1088
@eitanturok made their first contribution in #1114
*...

Contributors

lupesko, vchiley, and 22 other contributors

Assets 2

27 Mar 05:12

irenedea

v0.7.0

f044d6c

v0.7.0

🚀 LLM Foundry v0.7.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've made foundry more customizable and extensible!

New Features

Registerable Components (#975, #1043, #1052, #1057)

We've made key components of LLM Foundry registrable, such as models, loggers, and callbacks. You can use the registry to easily customize and extend your training workflows.

This means that you can register new options for these components, and then use them in your yaml config.

Check out the README for detailed instructions and code examples!

Breaking Changes and Deprecations

Deprecated Feature Removals (#1063)

We've removed support for deprecated features: triton attention, Prefix LMs, Llama attention patch, z-loss, and text denoising. These features were little used, and we removed them to focus on the core features that are heavily used.

If you were using these features please let us know how you were using them in a GitHub issue. We're happy to add things back that are in heavy usage.

What's Changed

Fix typo in monolithic chkpt callback docs by @sashaDoubov in #1024
Allow code-quality workflow to be callable by @b-chu in #1026
Fix llama attention patch by @dakinggg in #1036
Adds a decorator for experimental features by @dakinggg in #1038
Finish 0.6.0 release by @dakinggg in #1040
Remove reference to attn_impl: triton by @dakinggg in #1041
Registry based config - Part 1 by @dakinggg in #975
Deprecate attention patching for llama by @dakinggg in #1047
Compile GLU by @josejg in #1049
log details to metadata for run analytics by @angel-ruiz7 in #992
Update README.md by @dennyglee in #1056
Add chat schema example for mlflow by @dakinggg in #1054
Metrics registry by @dakinggg in #1052
LLM Foundry CLI (just registry) by @dakinggg in #1043
Bump Composer to 0.21.1 by @jjanezhang in #1053
Dataloaders registry by @dakinggg in #1044
Fix multi model eval by @dakinggg in #1055
Remove unnecessary test workflow by @dakinggg in #1058
Fix peft llama test by @dakinggg in #1059
Models registry by @dakinggg in #1057
Remove under construction from registry by @dakinggg in #1060
Custom Exceptions for Mosaic Logger by @jjanezhang in #1014
Bump version to 0.7.0 by @irenedea in #1063
Fix file filter by @dakinggg in #1067
Fix context printing by @irenedea in #1068

New Contributors

@angel-ruiz7 made their first contribution in #992
@dennyglee made their first contribution in #1056

Full Changelog: v0.6.0...v0.7.0

Contributors

dennyglee, sashaDoubov, and 6 other contributors

Assets 2

12 Mar 20:22

dakinggg

v0.6.0

08359b5

v0.6.0

🚀 LLM Foundry v0.6.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

Configurable loss for chat-formatted data (#985)

For chat-formatted data, you can now specify which tokens should be loss-generating in a configurable way.

This can be specified in the train_loader.dataset section of your yaml as follows:

...
train_loader:
  dataset:
    ...
    target_prompts: <FILL IN>
    target_reseponses: <FILL IN>

See the docstring for a description of the options.

Olmo support (#1016)

We've added support for the OLMo model from AI2.

To use OLMo, there are a few configuration parameters you need to set. First of all, you will need to install LLM Foundry with the extra package for OLMo (pip install .[gpu,olmo]).

Then you will need to adjust the tokenizer section of your config as follows:

tokenizer:
  name: allenai/OLMo-7B
  kwargs:
    revision: main
    model_max_length: 2048
    model_input_names:
    - input_ids
    - attention_mask
    trust_remote_code: true

Token accuracy (#983)

We've added a new, on-by-default metric to compute token accuracy in addition to cross entropy and perplexity.

Configurable activation checkpointing (#951)

More configurable activation checkpointing for MPT allows finer granular control over memory usage when training MPT. See the docstring for more details.

Finetuning with multiple streams, and pretokenized data (#933, #945, #946)

We've brought the finetuning dataloader up to speed with the pretraining dataloader to support mixing multiple streams, and pretokenizing finetuning data. See the yaml for a full example.

Eval Gauntlet v0.3 (#824)

We've release v0.3 of our Evaluation gauntlet. See the README for a full description.

Breaking changes and deprecations

Flash attention v1 removal (#1023)

Support for flash attention v1 has now been removed.

Extra BOS token removed (#1003)

When tokenizing prompt/response and chat data, for some tokenizers, we were mistakenly adding an extra BOS token between the prompt and the response. This has now been removed.

Deprecation of triton flash attention, prefixLM, and text denoising (#1007)

We've deprecated use of the triton version of flash attention, prefixLM, and text denoising, as these features were not heavily used or actively maintained.

What's Changed

Gauntlet v0.3: Fix chain-of-thought tasks by @bmosaicml in #824
Add finetuning streaming dataset conversion by @bigning in #933
Add default signature to mlflow saved model by @dakinggg in #952
allow te to use meta device with deferred init by @cli99 in #958
Update TUTORIAL.md by @sdonoso in #957
Update mcli yamls to use v0.5.0 by @irenedea in #959
add finutuning with streaming dataset example by @bigning in #945
Add fully configurable activation checkpointing by @cli99 in #951
Use create_model_version instead of register_model by @dakinggg in #953
Add streams support by @bigning in #946
Fix typo by @irenedea in #966
Fix eval.py with lora by @dakinggg in #965
add memory snapshot to callbacks by @cli99 in #810
Adding curriculum learning callback (experimental) by @snarayan21 in #954
strengthened chat formatting validation by @milocress in #960
Add new base images and remove fa1 images by @dakinggg in #970
Add new ICL kwargs in eval.py and long_context yamls by @maxisawesome in #925
Make composer pins consistent with each other by @dakinggg in #972
Make turbo an optional dependency by @snarayan21 in #964
Fix fewshot_random_seed default setting by @maxisawesome in #974
Improve error msg when checking target_blocks overlap by @cli99 in #977
Torch 2.2 upgrade - Part 1 by @dakinggg in #976
Torch 2.2 - Part 2 by @dakinggg in #979
PyTorch 2.2 - Part 3 by @dakinggg in #981
Remove torch 2.1 from docker workflow by @dakinggg in #982
Async callback: Don't skip checkpoints, reliably only launch async eval when the checkpoint is ready by @aspfohl in #813
Token accuracy metrics by @dakinggg in #983
Update readme to not mention 1.13_cu117 by @irenedea in #988
Patch test, lock mcli version by @aspfohl in #990
Bump gha timeouts by @aspfohl in #991
Fix readme typo by @dakinggg in #993
if condition in tie weights added by @megha95 in #989
Bump Composer to 0.20 by @dakinggg in #995
Trim examples ahead of time for auto packing by @irenedea in #994
add oom observer callback by @cli99 in #932
Use ci-testing repo for tests by @b-chu in #1000
Make CodeEval respect device_eval_batch_size by @josejg in #956
Remove try except around imports by @dakinggg in #1004
Deprecate triton, prefix lm, llama attention patch, and text denoising; Make ComposerHFT5 experimental by @irenedea in #1007
add magic filename for sharded state dicts by @milocress in #1001
Bump CI/CD to v3 by @mvpatel2000 in #1009
Fix evaluators actually pulling eval metrics by @mvpatel2000 in #1006
Build torch 2.2.1 images by @dakinggg in #1010
Add torch 2.2.1 tests by @dakinggg in #1011
Bump min torch pin to 2.2.1 by @dakinggg in #1013
Fix extra BOS token in front of response for some tokenizers by @dakinggg in #1003
Bump min composer pin by @dakinggg in #1015
Add default for eval interval by @irenedea in #987
Add support for olmo by @dakinggg in #1016
Add deeper support for multi-turn chats and loss-generating tokens in finetuning by @alextrott16 in #985
Add explicit packing ratio of 1 for profiling by @irenedea in #1019
Bump transformers to 4.38.2 by @dakinggg in #1018
Making sure MemoryMonitor takes in kwargs. by @snarayan21 in #1020
Update readme for torch version 2.2.1 by @irenedea in #1021
Add code import to train/eval scripts by @dakinggg in #1002
Bump version in readme by @bmosaicml in #1022
Bump version to 0.6.0 by @dakinggg in #1023

New Contributors

@bigning made their first contribution in #933
@sdonoso made their first contribution in #957
@josejg made their first contribution in #956

Full Changelog: v0.5.0...v0.6.0

Contributors

bigning, alextrott16, and 13 other contributors

Assets 2

08 Feb 00:01

irenedea

v0.5.0

a667ebf

v0.5.0

🚀 LLM Foundry v0.5.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

LoRA Support (with FSDP!) (#886)

LLM Foundry now supports LoRA via an integration with the PEFT library. Within LLM Foundry, run train.py, adding peft_config arguments to the model section of the config .yaml, like so:

model:
  ...
  peft_config:
      r: 16
      peft_type: LORA
      task_type: CAUSAL_LM
      lora_alpha: 32
      lora_dropout: 0.05
      target_modules:
      - q_proj
      - k_proj

ALiBi for Flash Attention (#820)

We've added support for using ALiBi with Flash Attention (v2.4.2 or higher).

model:
     ...
     attn_config:
         attn_impl: flash
         alibi: True

Chat Data for Finetuning (#884)

We now support finetuning on chat data, with automatic formatting applied using Hugging Face tokenizer chat templates.

Each sample requires a single key "messages" that maps to an array of message objects. Each message object in the array represents a single message in the conversation and must contain the following keys:

role : A string indicating the author of the message. Possible values are "system" ,"user" , and "assistant" .
content : A string containing the text of the message.

We require that there must be at least one message with the role "assistant", and the last message in the "messages" array must have the role "assistant" .

Here's an example .jsonl with chat data:


{ "messages": [ { "role": "user", "content": "Hi, MPT!" }, { "role": "assistant", "content": "Hi, user!" } ]}
{ "messages": [ 
  { "role": "system": "A conversation between a user and a helpful and honest assistant"}
  { "role": "user", "content": "Hi, MPT!" }, 
  { "role": "assistant", "content": "Hi, user!" },
  { "role": "user", "content": "Is multi-turn chat supported?"},
  { "role": "assistant", "content": "Yes, we can chat for as long as my context length allows." }
]}
...

Safe Load for HuggingFace Datasets (#798)

We now provide a safe_load option when loading HuggingFace datasets for finetuning.

This restricts loaded files to .jsonl, .csv, or .parquet extensions to prevent arbitrary code execution.

To use, set safe_load to true in your dataset configuration:

  train_loader:
    name: finetuning
    dataset:
      safe_load: true
      ...

New PyTorch, Composer, Streaming, and Transformers versions

As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (mixtral in particular).

Deprecations

Support for Flash Attention v1 (#921)

Will be removed in v0.6.0.

Breaking Changes

Removed support for PyTorch versions before 2.1 (#787)

We no longer support PyTorch versions before 2.1.

Removed Deprecated Features (#948)

We've removed features that have been deprecated for at least one release.

What's Changed

Small test fix to have right padding by @sashaDoubov in #757
Release 040 back to main by @dakinggg in #758
Bump composer version to 0.17.1 by @irenedea in #762
Docker release on workflow_dispatch by @bandish-shah in #763
Fix tiktoken wrapper by @dakinggg in #761
enable param group configuration in llm-foundry by @vchiley in #760
Add script for doing bulk generation against an endpoint by @aspfohl in #765
Only strip object names when creating new output path by @irenedea in #766
Add eval loader to eval script by @aspfohl in #742
Support inputs_embeds by @samhavens in #687
Better error message when test does not complete by @aspfohl in #769
Add codeowners by @dakinggg in #770
add single value support to activation_checkpointing_target by @cli99 in #772
Reorganize tests to make them easier to find by @aspfohl in #768
Add "completion" alias for response key by @dakinggg in #771
Shashank/seq id flash attn by @ShashankMosaicML in #738
Fix SIQA gold indices by @bmosaicml in #774
Add missing load_weights_only to example yamls by @dakinggg in #776
Patch flash attn in test to simulate environment without it installed by @dakinggg in #778
Update .gitignore by @aspfohl in #781
Disable mosaicml logger in foundry CI/CD by @mvpatel2000 in #788
Chat fomating template changes by @rajammanabrolu in #784
Remove tests and support for torch <2.1 by @dakinggg in #787
Fix utf-8 decode errors in tiktoken wrapper by @dakinggg in #792
Update gauntlet v0.2 to reflect results of calibration by @bmosaicml in #791
Remove from mcli.sdk imports by @aspfohl in #793
Auto packing fixes by @irenedea in #783
Enable flag to not pass PAD tokens in ffwd by @bcui19 in #775
Adding a fix for Cross Entropy Loss for long sequence lengths. by @ShashankMosaicML in #795
Minor readme updates and bump min python version by @dakinggg in #799
Enable GLU FFN type by @vchiley in #796
clean up resolve_ffn_hidden_and_exp_ratio by @vchiley in #801
Fix token counting to use attention mask instead of ids by @dakinggg in #802
update openai wrapper to work with tiktoken interface and newest openai version by @bmosaicml in #794
Fix openai not conditioned imports by @dakinggg in #806
Make the ffn activation func configurable by @vchiley in #805
Clean up the logs, bump datasets and transformers by @dakinggg in #804
Fix remote path check for UC volumes by @irenedea in #809
Expand options for MMLU. by @mansheej in #811
Async eval callback by @aspfohl in #702
Updating the Flash Attention version to fix cross entropy loss by @ShashankMosaicML in #812
Remove redundant transposes for rope rotation by @ShashankMosaicML in #807
Add generic flatten imports to HF checkpointer by @b-chu in #814
Fix token counting to allow there to be no attention mask by @dakinggg in #818
Default to using tokenizer eos and bos in convert_text_to_mds.py by @irenedea in #823
Revert "Default to using tokenizer eos and bos in convert_text_to_mds.py" by @irenedea in #825
Bump turbo version to 0.0.7 by @mvpatel2000 in #827
Align GLU implementation with LLaMa by @vchiley in #829
Use sync_module_states: True when using HSDP by @abhi-mosaic in #830
Update composer to 0.17.2 and streaming to 0.7.2 by @irenedea in #822
zero bias conversion corrected by @megha95 in #624
Bump einops version, which has improved support for torch compile by @sashaDoubov in #832
Update README with links to ML HW resources by @abhi-mosaic in #833
Add safe_load option to restrict HF dataset downloads to allowed file types by @irenedea in #798
Adding support for alibi when using flash attention by @ShashankMosaicML in #820
Shashank/new benchmarks by @ShashankMosaicML in #838
Fix error when decoding a token in the id gap (or out of range) in a tiktoken tokenizer by @dakinggg in #841
Add use_tokenizer_eos option to convert text to mds script by @irene...

Contributors

Skylion007, sashaDoubov, and 23 other contributors

Assets 2

Releases: mosaicml/llm-foundry

v0.13.0

🚀 LLM Foundry v0.13.0

🛠️ Bug Fixes & Cleanup

Pytorch 2.4 Checkpointing (#1569, #1581, #1583)

🔧 Dependency Updates

What's Changed

Contributors

v0.12.0

🚀 LLM Foundry v0.12.0

New Features

PyTorch 2.4 (#1505)

Extensibility improvements (#1450, #1449, #1468, #1467, #1478, #1493, #1495, #1511, #1512, #1527)

Improved error messages (#1457, #1459, #1519, #1518, #1522, #1534, #1548, #1551)

Sliding window in torch attention (#1455)

Bug fixes

Extra BOS token for llama 3.1 with completion data (#1476)

What's Changed

New Contributors

Contributors

v0.11.0

🚀 LLM Foundry v0.11.0

New Features

LLM Foundry CLI Commands (#1337, #1345, #1348, #1354)

Docker Images Contain All Optional Dependencies (#1431)

Support for Llama3 Rope Scaling (#1391)

Tokenizer Registry (#1386)

LoadPlanner and SavePlanner Registries (#1358)

Faster Auto-packing (#1435)

What's Changed

Contributors

v0.10.0

🚀 LLM Foundry v0.10.0

New Features

Registry for ICL datasets (#1252)

Curriculum Learning Callback (#1256)

[Experimental] Interweave Attention Layers (#1299)

Bug fixes

What's Changed

New Contributors

Contributors

v0.9.1

🚀 LLM Foundry v0.9.1

Whats changed

Contributors

v0.9.0

🚀 LLM Foundry v0.9.0

New Features

More Token Encoding Types (#1254)

Enforced Stricter Configs (#1254, #1225, #1202)

Chunked text to mds conversion (#1240)

Breaking Changes and Deprecations

What's Changed

Contributors

v0.8.0

🚀 LLM Foundry v0.8.0

New Features

Megablocks support (#1102)

Expanded Registries (#1080, #1093, #1094, #1095, #1096, #1165)

Support for ShareGPT chat format (#1098)

Breaking Changes and Deprecations

In Context Learning Code Evaluation (#1181)

Question-Answering

What's Changed

New Contributors

Contributors

v0.7.0

🚀 LLM Foundry v0.7.0

New Features

Registerable Components (#975, #1043, #1052, #1057)

Breaking Changes and Deprecations

Deprecated Feature Removals (#1063)

What's Changed

New Contributors

Contributors

v0.6.0

🚀 LLM Foundry v0.6.0

New Features

Configurable loss for chat-formatted data (#985)

Olmo support (#1016)