Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement caching using Souin (take 2) #159

Merged
merged 4 commits into from
Dec 4, 2024

Conversation

mac-chaffee
Copy link
Contributor

@mac-chaffee mac-chaffee commented Nov 16, 2024

Summary

This PR implements in-memory caching of pgs response bodies inside Caddy using a plugin called Souin.

Fixes #149. Supersedes #154. The rest of this description is copied from #154, with changes bolded.

Background

Currently, serving a single asset from pgs can be a little slower than it could be due to DNS lookups, DB queries, and re-reading _headers and _redirects. Experiments in #151 showed that all these expensive operations are necessary and they cannot be simply cached in-process since pgs-web and pgs-ssh are separate processes that both need control over the cache. Thus, we have to use a cache like Souin which runs separate from pgs-web and pgs-ssh. But since it runs inside Caddy, no separate infrastructure is needed.

How does the caching work?

When any user requests an asset like https://example.com/styles.css for the first time, the asset is fetched and returned to the user like normal, but now a copy of the response body and headers is stored by Souin (specifically using a high-performance in-memory store called Otter which has impressive benchmarks and requires no special parameter tuning making it a good choice IMO).

The cached response body is assigned a TTL of 5 minutes (which we could increase later if we get more confident with our cache flushing code). Each cached body is also associated with two keys:

  • There's a main key which looks like GET-https-example.com-/styles.css. This key is used for quickly serving existing cached files when new requests arrive for the same assets.
  • We also set a "surrogate key" which looks like fakeuser-www-proj (this matches the *.pgs.sh subdomain or the value in the TXT entry). This key is used for purging the cache for an entire project when any files are modified.

If any other users request the same file within 5 minutes, the responses will be served directly by Caddy from the in-memory cache. After 5 minutes, the cache for that asset expires.

Caveat: The cache is limited to 10,000 items. Otter uses a special algorithm to evict rarely-accessed items from the cache to make room for new items.

How does cache purging work?

Souin has an API for examining and purging entries from the cache. Unfortunately, there's no way to expose this API on a different port, so we have to protect it from abuse with Basic Auth.

When any files are written or deleted (like with rsync), we use a golang channel to asynchronously purge the cache for the entire project by using surrogate keys. This involves sending an HTTP request to PURGE https://pgs.sh/souin-api/souin/. These purge requests are debounced so that we only purge the cache once per site per 5 seconds.

This API is reachable from the public internet, just protected with basic auth. So in case of emergencies, admins can do things like curl -i -X PURGE -u testuser:password https://pgs.sh/souin-api/souin/flush to purge the entire cache.

FAQ

  • How much RAM will this use?
    • 50th percentile asset size seems to be around 500KB so a full cache (10,000 items) could be 5GB. Max theoretical is 10GB due to the 1MB max_cacheable_body_bytes.
  • Are there any infrastructure changes required?
    • Mostly no. You'll need to rebuild/repush the Caddy image and you should generate a new basic_auth username/password.
  • What isn't cached?
    • Errors, and the /check endpoint. Everything else (including the assets for https://pgs.sh) is cached. HTML files ARE now cached since the pipe-based analytics is ready.
  • Are private pages still kept private?
    • Yes, since those requests bypass Caddy and go straight to pgs-ssh.
  • Will this work with multi-region pico?
    • Technically yes, now that the pipe-based analytics is ready. We'd have to modify purgeCache to loop over all the hostnames of all regional Caddy instances. Also note that Otter does not support clustering, so each regional instance of Caddy will have it's own independent cache.

Still TODO

  • Local end-to-end testing (haven't re-tested this since recreating the PR)
  • Make sure this doesn't break other services like imgs or prose.

TODO items for admins:

  • Generate new basic auth password that isn't password
  • After merging, rebuild the Caddy image (to include the Souin plugin)

@neurosnap
Copy link
Member

neurosnap commented Nov 28, 2024

We just deployed aggregating site usage analytics directly from caddy access logs: 4e0839a

This means our pgs web service no longer does anything with analytics. So that is no longer a blocker for this work.

@mac-chaffee
Copy link
Contributor Author

mac-chaffee commented Dec 3, 2024

Awesome! Just rebased and removed the code that was disabling caching for html files. Sorry I've been struggling to test this e2e locally, but I think it's ready.

@neurosnap for the basic auth password, want to commit that to this PR? Or update it after merging but before deploying?

You can generate one with docker run -it caddy:latest caddy hash-password, then update caddy/Caddyfile.pgs.

@mac-chaffee mac-chaffee marked this pull request as ready for review December 3, 2024 23:38
Copy link
Member

@neurosnap neurosnap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, everything looks great to me! Thanks so much for really digging in and making a great contribution to pico!

@neurosnap neurosnap merged commit e01669e into picosh:main Dec 4, 2024
2 of 11 checks passed
@neurosnap
Copy link
Member

This has been deployed. Feel free to test / confirm, but everything is working ok from my perspective.

@neurosnap
Copy link
Member

I also added ssh pgs.sh cache {project} which will manually purge the cache for the project.

@mac-chaffee
Copy link
Contributor Author

Awesome! Thanks for testing!

@mac-chaffee mac-chaffee deleted the caddy-caching branch December 4, 2024 23:36
@mac-chaffee
Copy link
Contributor Author

Hmm I think my Caddyfile config is not correct. Files served for sites using the pgs.sh domain are cached, but not sites using custom domains.

$ curl -si https://mac-www.pgs.sh | grep -i cache
cache-control: 
cache-status: Souin; fwd=uri-miss; stored; key=GET-https-mac-www.pgs.sh-/

$ curl -si https://www.macchaffee.com | grep -i cache

I'll have to add a cache block to the :443 { block. I'll have to make sure that still works with the basic auth setup.

@mac-chaffee
Copy link
Contributor Author

Yeah we'll have to move stuff around in the Caddyfile. If you specify the cache directive in more than one site block, those are independent caches. So we'll need one site block labelled :443, *.{$APP_DOMAIN}, {$APP_DOMAIN} { and we'll have to do something about the different TLS config.

@neurosnap
Copy link
Member

Hmm, this might be an issue. I don't see a matcher for the tls option in caddy. What are the downsides to having two separate caches here? It seems like the biggest negative is purging multiple caches? Not ideal, but we are going to need to figure out how to do that soon anyway because we are going to provision more VMs internationally. Maybe this is a good opportunity to figure that out?

Happy to also continue down the path of merging custom/sub domains into a single block.

@neurosnap
Copy link
Member

@neurosnap
Copy link
Member

#174

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improving pgs asset serving performance (brainstorming)
2 participants