add chunk size experiment #341

spencerschrock · 2025-01-16T17:44:29Z

Summary

Adds a simple experiment to vary chunk size, built on top of #306

So far, the experiment has shown similar results on two different machines (ARM64 macOS and x86 Linux), and on two different models.

Two factors may limit what models can be benchmarked:

Special size of 0, which will attempt to read whole files into memory.
timeit.timeit disables garbage collection, so even if smaller buffers are used, older ones may stay around in memory.

The hashing algorithm was left as SHA256 based on the results from the hashing experiment.

hatch run +py=3.11 bench:chunk /tmp/falcon-7b
0:              8.7508
1024:          76.3885
2048:         127.3363
4096:          75.9053
8192:          46.9426
16384:         20.7537
32768:         11.0133
65536:          8.3529
131072:         7.9244
262144:         7.7939
524288:         7.7006
1048576:        7.6512
2097152:        7.6781
4194304:        7.8679
8388608:        7.8370
16777216:       7.8448
33554432:       8.5892
67108864:       8.5765
134217728:      8.6458
268435456:      8.6698
536870912:      8.6905
1073741824:     8.6779

The result of the benchmarks suggest increasing the chunk size to at least 128 KB (131072) with a 1MB read size producing the best results in this benchmark. Happy to do that in a follow-up PR, or this one.

Release Note

NONE

Documentation

NONE

So far, the experiment has shown similar results on two different machines (ARM64 macOS and x86 Linux), and on two different models. Two factors may limit what models can be benchmarked: 1. Special size of 0, which will attempt to read whole files into memory. 2. `timeit.timeit` disables garbage collection, so even if smaller buffers are used, older ones may stay around in memory. The hashing algorithm was left as SHA256 based on the results from the hashing experiment. Signed-off-by: Spencer Schrock <[email protected]>

benchmarks/exp_chunk.py

Signed-off-by: Spencer Schrock <[email protected]>

spencerschrock requested review from a team as code owners January 16, 2025 17:44

spencerschrock commented Jan 16, 2025

View reviewed changes

benchmarks/exp_chunk.py Show resolved Hide resolved

mihaimaruseac previously approved these changes Jan 16, 2025

View reviewed changes

benchmarks/exp_chunk.py Show resolved Hide resolved

benchmarks/exp_chunk.py Outdated Show resolved Hide resolved

avoid invalid pytype directive syntax

1e13597

Signed-off-by: Spencer Schrock <[email protected]>

spencerschrock dismissed mihaimaruseac’s stale review via 1e13597 January 16, 2025 18:02

mihaimaruseac approved these changes Jan 16, 2025

View reviewed changes

mihaimaruseac merged commit f0a6e96 into sigstore:main Jan 16, 2025
33 checks passed

spencerschrock deleted the chunk branch January 16, 2025 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add chunk size experiment #341

add chunk size experiment #341

spencerschrock commented Jan 16, 2025 •

edited

Loading

add chunk size experiment #341

add chunk size experiment #341

Conversation

spencerschrock commented Jan 16, 2025 • edited Loading

Summary

Release Note

Documentation

spencerschrock commented Jan 16, 2025 •

edited

Loading