Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solid State Space Models #234

Open
bonham79 opened this issue Aug 12, 2024 · 2 comments
Open

Solid State Space Models #234

bonham79 opened this issue Aug 12, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request new architecture

Comments

@bonham79
Copy link
Collaborator

bonham79 commented Aug 12, 2024

(Lowest of the low priorities)

SSMs have been making the rounds but people have only cared about them for 'major' tasks. (NMT models, speech, LLM). Since they're special LSTMs and we see better performance from that type of model on our type of tasks, may be fun to implement an SSM decoder and try out.

More than theoretical interest, they're supposed to be more memory efficient than transformers, so we can probably run some wicked batch sizes if they're implemented well.

@bonham79 bonham79 self-assigned this Aug 12, 2024
@bonham79 bonham79 added the enhancement New feature or request label Aug 12, 2024
@kylebgorman
Copy link
Contributor

I don't know anything about how these work yet, but they're the only "new architecture" in a long time, so why not. Any reason to think they're more or less applicable to our class of problems?

@bonham79
Copy link
Collaborator Author

Their main selling point is being linear memory scaling with token length. For our class of problems that's not really a concern. But it would let us further minimize the memory footprint of architectures, letting us go hog wild with batch sizes and model sizes on lower-level hardware.

Theoretical justification? we've seen LSTMs generally outperform transformers on a lot of our tasks (qua Adam's paper, anti qua Wu). So having an LSTM like model that competes against transformers further allows us to dig our heels into the power of modeling assumptions.

But really my only reason is:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new architecture
Projects
None yet
Development

No branches or pull requests

2 participants