Constant memory LEAP #12

mtanghu · 2022-09-03T04:12:54Z

Transformers are RNNs (link: https://arxiv.org/pdf/2006.16236.pdf) talks about constant memory gradient computations which should be entirely realizeable for this project (the math is reasonably similar in structure at least).

Seemingly currently the memory usage does scale with sequence length though this may just be because larger inputs will need more memory to store all the embeddings. To that extent, it shouldn't help the training that's done in parallel.

This will be important for infinite context in the RNN formulation though! (see #14)

mtanghu added enhancement New feature or request help wanted Extra attention is needed labels Sep 3, 2022

This was referenced Sep 3, 2022

RNN formulation #15

Open

Latest embeddings techniques (Infinite Context Length?) #14

Open

mtanghu changed the title ~~Constant memory LEAP?~~ Constant memory LEAP Sep 3, 2022

mtanghu removed the help wanted Extra attention is needed label Dec 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constant memory LEAP #12

Constant memory LEAP #12

mtanghu commented Sep 3, 2022 •

edited

Loading

Constant memory LEAP #12

Constant memory LEAP #12

Comments

mtanghu commented Sep 3, 2022 • edited Loading

mtanghu commented Sep 3, 2022 •

edited

Loading