-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement caching using Souin (take 2) #159
Conversation
We just deployed aggregating site usage analytics directly from caddy access logs: 4e0839a This means our pgs web service no longer does anything with analytics. So that is no longer a blocker for this work. |
7bfeaf8
to
7755d63
Compare
Awesome! Just rebased and removed the code that was disabling caching for html files. Sorry I've been struggling to test this e2e locally, but I think it's ready. @neurosnap for the basic auth password, want to commit that to this PR? Or update it after merging but before deploying? You can generate one with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally, everything looks great to me! Thanks so much for really digging in and making a great contribution to pico!
This has been deployed. Feel free to test / confirm, but everything is working ok from my perspective. |
I also added |
Awesome! Thanks for testing! |
Hmm I think my Caddyfile config is not correct. Files served for sites using the pgs.sh domain are cached, but not sites using custom domains.
I'll have to add a |
Yeah we'll have to move stuff around in the Caddyfile. If you specify the |
Hmm, this might be an issue. I don't see a matcher for the Happy to also continue down the path of merging custom/sub domains into a single block. |
Summary
This PR implements in-memory caching of pgs response bodies inside Caddy using a plugin called Souin.
Fixes #149. Supersedes #154. The rest of this description is copied from #154, with changes bolded.
Background
Currently, serving a single asset from pgs can be a little slower than it could be due to DNS lookups, DB queries, and re-reading _headers and _redirects. Experiments in #151 showed that all these expensive operations are necessary and they cannot be simply cached in-process since pgs-web and pgs-ssh are separate processes that both need control over the cache. Thus, we have to use a cache like Souin which runs separate from pgs-web and pgs-ssh. But since it runs inside Caddy, no separate infrastructure is needed.
How does the caching work?
When any user requests an asset like
https://example.com/styles.css
for the first time, the asset is fetched and returned to the user like normal, but now a copy of the response body and headers is stored by Souin (specifically using a high-performance in-memory store called Otter which has impressive benchmarks and requires no special parameter tuning making it a good choice IMO).The cached response body is assigned a TTL of 5 minutes (which we could increase later if we get more confident with our cache flushing code). Each cached body is also associated with two keys:
GET-https-example.com-/styles.css
. This key is used for quickly serving existing cached files when new requests arrive for the same assets.fakeuser-www-proj
(this matches the*.pgs.sh
subdomain or the value in the TXT entry). This key is used for purging the cache for an entire project when any files are modified.If any other users request the same file within 5 minutes, the responses will be served directly by Caddy from the in-memory cache. After 5 minutes, the cache for that asset expires.
Caveat: The cache is limited to 10,000 items. Otter uses a special algorithm to evict rarely-accessed items from the cache to make room for new items.
How does cache purging work?
Souin has an API for examining and purging entries from the cache. Unfortunately, there's no way to expose this API on a different port, so we have to protect it from abuse with Basic Auth.
When any files are written or deleted (like with rsync), we use a golang channel to asynchronously purge the cache for the entire project by using surrogate keys. This involves sending an HTTP request to
PURGE https://pgs.sh/souin-api/souin/
. These purge requests are debounced so that we only purge the cache once per site per 5 seconds.This API is reachable from the public internet, just protected with basic auth. So in case of emergencies, admins can do things like
curl -i -X PURGE -u testuser:password https://pgs.sh/souin-api/souin/flush
to purge the entire cache.FAQ
max_cacheable_body_bytes
./check
endpoint. Everything else (including the assets for https://pgs.sh) is cached. HTML files ARE now cached since the pipe-based analytics is ready.pgs-ssh
.purgeCache
to loop over all the hostnames of all regional Caddy instances. Also note that Otter does not support clustering, so each regional instance of Caddy will have it's own independent cache.Still TODO
imgs
orprose
.TODO items for admins:
password