-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement caching using Souin #154
Conversation
85d93e8
to
9f8b128
Compare
Was able to test this locally and found the surrogate key system doesn't behave according to the spec; they keys are never stored so the purging does nothing. I filed an issue upstream: darkweak/souin#563 Everything else seems correct though. |
Some benchmarking on my local laptop (I expect the performance difference to be even greater in production since my laptop has DNS caching and a very fast SSD): Without caching
With caching
|
Thanks so much for taking the time to submit this PR! I've been discussing it with my colleague @antoniomika One thing we are planning right now is multi-region pgs, I wonder how caching would work when we are serving sites for multiple regions (e.g. cache busting)? Further, we would also need to come up with a solution for analytics since that is a big feature for us. Given these requirements, It makes us wonder if your original in-app cache would be better suited for our needs. It kind of seems like we might want to implement pgs global first then think about a holistic caching system that can accommodate it. Thoughts? For example, we just released https://pipe.pico.sh and are wondering if incorporating an event system into pgs caching makes sense. |
Multi-region pgs sounds like it would require the cache storage and the cache purging to also be multi-region too, right? Do you all have a stance on the use of CDNs? IMO, "just swipe the credit card and use a CDN" would probably be easiest, but so far I've enjoyed the challenge of trying to find ways to avoid CDNs and additional infrastructure.
With any of those options, we can mostly re-use the code in this PR, just removing the Caddy changes and changing the format of the API call in the If you want to do multi-region caching without a CDN, one option is to simply replace Otter with one of the other Souin storage backends that support clustering such as Redis. Still have to pull analytics from Caddy/Prometheus. Another more-ambitious idea could be using/building a golang http caching middleware like https://github.com/victorspringer/http-cache , combined with something like https://github.com/riverqueue/river to coordinate cache purging through Postgres. EDIT: This option has the benefit of allowing us to keep analytics mostly unchanged. I don't have a strong opinion, I'm happy with any option that makes my pgs site blazing fast, whether it includes my Go code or not :) |
Hey, I'm the creator of Souin. First thank you for considering to add this HTTP cache in this great project. Open to discuss about your needs and what we could do together. |
@mac-chaffee Thanks so much for taking the time to research and write a thoughtful response, much appreciated. @darkweak Greetings! Thanks for finding our little project and providing an awesome library (Souin), also much appreciated. @antoniomika and I have had a chat and because of the way we record analytics and consider it a first-class feature at pico, we are leaning towards implementing the caching mechanism in our app code, although we can be convinced otherwise if there's a better way to outsource http caching (which is a solved problem that doesn't need re-inventing). @mac-chaffee I like your idea of using Here's a patch that we could use to be able to record analytics for cache hits which we could try to upstream: However, this still doesn't solve how we are going to clear the cache when a site gets uploaded since these are two separate services without a great way to communicate between them. Another important issue here is we support proxying requests to external services: https://pico.sh/pgs#proxy-to-another-service. We do not want to cache those responses. If we went with In summary:
I'd be happy to continue facilitating this conversation so lemme know what you need from me! |
You can tag your resources in Souin and invalidate by tag when needed, Souin exposes an API for that (you can purge by regexp, tag or per resource/uri).
Just return a Cache-Control no-store directive should be enough for responses that are proxied. |
Let's move the conversation to #149. I'll go ahead and close this but will leave the branch intact. |
@darkweak Those are really nice features! Do you have a recommendation on how we could still record cache hits in our site-usage analytics systems? |
Fixes #149
This PR implements in-memory caching of pgs response bodies inside Caddy using a plugin called Souin.
Background
Currently, serving a single asset from pgs can be a little slower than it could be due to DNS lookups, DB queries, and re-reading _headers and _redirects. Experiments in #151 showed that all these expensive operations are necessary and they cannot be simply cached in-process since pgs-web and pgs-ssh are separate processes that both need control over the cache. Thus, we have to use a cache like Souin which runs separate from pgs-web and pgs-ssh. But since it runs inside Caddy, no separate infrastructure is needed.
How does the caching work?
When any user requests an asset like
https://example.com/styles.css
for the first time, the asset is fetched and returned to the user like normal, but now a copy of the response body and headers is stored by Souin (specifically using a high-performance in-memory store called Otter which has impressive benchmarks and requires no special parameter tuning making it a good choice IMO).The cached response body is assigned a TTL of 5 minutes (which we could increase later if we get more confident with our cache flushing code). It's also associated with two keys:
GET-https-example.com-/styles.css
. This key is used for quickly serving existing cached files when new requests arrive for the same assets.fakeuser-www-proj
(this matches the*.pgs.sh
subdomain or the value in the TXT entry). This key is used for purging the cache for an entire project when any files are modified.If any other users request the same file within 5 minutes, the responses will be served directly by Caddy from the in-memory cache. After 5 minutes, the cache for that asset expires.
Caveat: The cache is limited to 10,000 items. Otter uses a special algorithm to evict rarely-accessed items from the cache to make room for new items.
How does cache purging work?
Souin has an API for examining and purging entries from the cache. Unfortunately, there's no way to expose this API on a different port, so we have to protect it from abuse with Basic Auth.
When any files are written or deleted (like with rsync), we use a golang channel to asynchronously purge the cache for the entire project by using surrogate keys. This involves sending an HTTP request to
PURGE https://pgs.sh/souin-api/souin/
. These purge requests are debounced so that we only purge the cache once per site per 5 seconds.This API is reachable from the public internet, just protected with basic auth. So in case of emergencies, admins can do things like
curl -i -X PURGE -u testuser:password https://pgs.sh/souin-api/souin/flush
to purge the entire cache.FAQ
max_cacheable_body_bytes
./check
endpoint. Everything else (including the assets for https://pgs.sh) is cached.pgs-ssh
.Still TODO
imgs
orprose
.Open to feedback and questions!