L2-friendly chunking and twiddle persistence for batched NTTs and batched NTT+bitreverse sequences #31

mcarilli · 2024-02-07T19:35:21Z

What ❔

Batched NTT (+bitrev) operations launch a sequence of several kernels. This PR splits batches into chunks small enough to persist in the L2 cache across kernel launches, ie, we do the whole NTT (+bitrev) sequence for the first chunk, then the second chunk, and so on.

Why ❔

Leveraging L2 persistence this way reduces gmem traffic and improves performance.

Checklist

PR title corresponds to the body of PR (we generate changelog entries from PRs).
Tests for the changes have been added / updated.
Documentation comments have been added / updated.
Code has been formatted via zk fmt and zk lint.

Required by matter-labs/era-shivini#31 ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `cargo fmt` and `cargo lint`.

implement tentative structure of l2 chunked ntt calls

b4f2f51

mcarilli force-pushed the mc-ntt-persistence branch from a25d5f8 to 60dbec7 Compare February 7, 2024 19:36

mcarilli mentioned this pull request Feb 7, 2024

[WIP] L2-friendly chunking for batched NTTs and batched NTT+bitreverse sequences #24

Closed

4 tasks

tentative structure for all chunked ntts compiles

2bc2b81

mcarilli force-pushed the mc-ntt-persistence branch from 60dbec7 to 2bc2b81 Compare February 7, 2024 23:20

mcarilli added 2 commits February 7, 2024 23:22

Merge remote-tracking branch 'origin/main' into mc-ntt-persistence

407a913

L2 persistence for twiddles. Private in context.rs for now

61eab26

mcarilli mentioned this pull request Feb 8, 2024

Supporting diffs for batched NTT chunking and twiddle persistence matter-labs/era-boojum-cuda#25

Merged

4 tasks

mcarilli added 3 commits February 8, 2024 00:46

add more twiddles

4f45ffc

avoid cuda api calls during dry run

94d855f

don't bother with persistence if a chunk can't fit in L2

f7e7c54

mcarilli requested a review from robik75 February 9, 2024 16:36

mcarilli force-pushed the mc-ntt-persistence branch from a1daa63 to f7e7c54 Compare February 11, 2024 23:12

Merge remote-tracking branch 'origin/main' into mc-ntt-persistence

e63dff4

mcarilli added 3 commits February 13, 2024 16:55

repoint Cargo.toml to boojum-cuda upstream

e765c2b

update Cargo.lock

11cfaf9

add chunked calls that add no-op epilogues

dfcc4cf

mcarilli changed the title ~~[WIP] L2-friendly chunking and twiddle persistence for batched NTTs and batched NTT+bitreverse sequences~~ L2-friendly chunking and twiddle persistence for batched NTTs and batched NTT+bitreverse sequences Feb 13, 2024

robik75 approved these changes Feb 13, 2024

View reviewed changes

mcarilli merged commit f25a855 into main Feb 13, 2024
4 checks passed

robik75 deleted the mc-ntt-persistence branch August 6, 2024 12:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L2-friendly chunking and twiddle persistence for batched NTTs and batched NTT+bitreverse sequences #31

L2-friendly chunking and twiddle persistence for batched NTTs and batched NTT+bitreverse sequences #31

mcarilli commented Feb 7, 2024

L2-friendly chunking and twiddle persistence for batched NTTs and batched NTT+bitreverse sequences #31

L2-friendly chunking and twiddle persistence for batched NTTs and batched NTT+bitreverse sequences #31

Conversation

mcarilli commented Feb 7, 2024

What ❔

Why ❔

Checklist