Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NATS JetStream startup message indicates more memory available than soft memory limit set by GOMEMLIMIT #3436

Open
matrixbot opened this issue Nov 2, 2024 · 1 comment

Comments

@matrixbot
Copy link
Collaborator

This issue was originally created by @paigeadelethompson at matrix-org/dendrite#3436.

​ The Go authors explicitly label GOMEMLIMIT a "soft" limit. That means that the Go runtime does not guarantee that the memory usage will exceed the limit. Instead, it uses it as a target.

Just to be clear I'm not missing that point at all nor should anybody else misunderstand. It actually does fairly good job of minimizing the amount of memory it will try to use but there are some loose ends in Dendrite that will eventually run away and the system (having 2GB total) will ultimately become unusable. It's also not unimaginable that a glimpse into a profiler (maybe pprof) might reveal a couple of things that could be made to work more efficiently.

I really want to say yes, because if nothing else you can limit the work load and refuse work, preempt less important work and as near as I can tell to some extent it already does this but I think the thing that is missing is either a backlog or just outright refusing work that the process doesn't have the capacity for--given a soft memory limit.

Essentially what I have to do because of this is set a hard limit, sort of like an "upper maximum" that will ensure that the process is OOM killed when it finally runs away and makes the whole system unusable. I still believe that Dendrite itself should be able to make the best use of what it has available to it and refuse work gracefully, in general this isn't a hobby that a lot of people are willing to pay money for if even a little so you can reasonably expect most of the people who are going to be using this to want to run on the absolute minimum set of requirements. I gather there are some people who are working on this project who in fact tell you that they're running on just 2GB of memory without a SOFTLIMIT but I would guess the problem with that is they're not actually trying to cruise around and be in more than 1-2 channels.

On an unrelated note and one that is probably out of scope for this reported issue I think a big problem with Matrix is that a lot of people don't know what the hell it's doing and the conclusion that people arrive at is that it's more broken than it actually is; if more instrumentation could be made available through common frontends like element about events (failed/in-flight/succeeded), kind, event duration by room, virtually anything that is measurable that helps people arrive at a conclusion about whether the problem is with them or somebody else would help to eliminate myths about whether or not it works correctly.

@matrixbot
Copy link
Collaborator Author

This comment was originally posted by @paigeadelethompson at matrix-org/dendrite#3436 (comment).

Also worth considering:

  • is there options/config to at least limit jetstream RAM usage on server nats-io/jetstream#21
  • It's been a while since I looked at the JetStream message queues but the one SaaS provider that I could find considered Dendrite's number of topics to be too liberal and also wouldn't even work without a specified capacity, this is probably also a good place to start in terms of deciding what's reasonable given a minimal set of requirements
  • Payload size is another extent that by default can be larger than 8MB (larger than their own recommended maximum?)

Summary:
If I can find some time to setup the profiler and gather some usable metrics I will but just in case anybody feels inclined to try themselves; personally I think for regression y'all should be doing your development with a set GOMEMLIMIT because it says a lot more about how good of a job you're doing when your code can stand reliably on it's own given a substantial workload but for all I know people already are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant