-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store cache with Redis / Valkey #66
Comments
What problem would the cache solve that the current in-memory cache does not address? Is the use-case sharing the cache between several Leng instances? |
Yes that would be the use case. I imagine most records would have a decent enough ttls for it to be beneficial in reducing lookups to the upstream dns server. |
I am not super convinced of how beneficial this would be, considering the complexity it adds. Assuming the shared cache is on a different node:
I don't think we can answer (1) without knowing the built-in cache hit-rate. I think I will add a metric for this in the next release, it is something that I am now curious about! As for (2), I guess that very much depends on your setup. Even if your redis cache is in the same datacenter as Leng, and you are going to So I think I can only see this being very useful if your machines are wired up together in your house, but the internet is far away? At any rate, I will build a new metric for the cache hit-rate and we can see how that does. Let me know what you think! |
That would be a good metric to add! Another note that was pointed out would be increased privacy from the upstream DNS server, they wouldn't be able to see as much / track your habits. That being said I am also not sure how much of an advantage it is to be worth it, it wouldn't be complex for me to configure at least and I'd be happy to test, but code complexity for saving a couple ms may not be worth it. Edit: also jealous of your ping time. I get 20+ on my coax and 50+ on my backup 5g wan! |
Hi @seang96 did you manage to use the metric in order to estimate your cache hit-rate? On my end, I have since your issue spotted this 0xERR0R/blocky#945 which reflects the added complexity Redis can bring to leng. |
I am using v1.6.1 and I still don't see the metric. It looks like that was released before it was added looking at the dates of commits and releases? |
ah sorry about that - it's been in master for a while, just not as part of a release. You can run it with the tag sha-20f09ef (assuming you are using containers here) |
I updated and see it now. Githubs mobile app sucks and doesn't show packages or at least I can't find it. I'll let it collect for a little bit and let you know how it goes. Thanks! |
Statics wise so far I get around 12%-14% combined as cache hits. One of them is generally 14-20% and others are 8-12%. I did have another idea of trying to use DoH which I can cache via nginx proxy itself on the server side and with cache-control header on the client side. Unfortunately I get timeouts with DoH, nginx is reporting 499 response code with a timeout of 5 seconds on the backend on occasion. It looks like you can adjust the timeout on DoH, but at 10s it still times out at 5. It looks like the timeout for the DNS server itself is 5 seconds and non configurable which makes DoH max timeout 5 seconds? Can the DNS server timeout be configurable? I think its hard coded here main.go#L74-L75 |
thanks for reproting back!
The timeout for making upstream DNS queries (including DoH) is defined in In the meantime, I also just merged #71 which I think will be useful to measure how long the upstreams are taking. Is your DNS upstream really taking 5s? Or is leng being that slow? If that is the case I would much rather look into fixing that rather than a shared Redis cache. Hopefully you do not need for a cache as much if leng is faster and you do more cache its. Your hit-rate is surprisingly slow as well, maybe look into growing the cache size a little? That might improve it |
I don't think the cache size is the issue its the DNS requests load is split among 3 instances. I'd say the timeouts are more important as well. Maybe it should separate as a new issue to look at? Anyways its on the /dns-query endpoint so not metrics that's timing out at 5s and it always times out at 5 seconds even with DoH timeout at 10000 and the upstream doh timeout_s set to 10. I did some additional routing to make the upstream route to coax ISP instead of my 5G ISP which helped decrease timeouts but it still occurs through DoH. Is the DNS service being used for the lookups still with the DoH service? I see config.DnsOverHttpServer.Bind being passed into the DoH service so I assumed it might be the culprit. |
Sorry if I was not straightforward in my message, but did you try setting
All lookups all resolved the same way initially: first it tries DoH, then it tries the protocol of the request (defaulting back to TCP if you were initially using DoH)
It's true that each server should only receive one third of the requests in this case, but with a big enough cache and a big enough TTL (about 3x bigger on average for your example) the load-balanced servers should converge to the same hit-rate that single non-load balanced one would have. This becomes impractical if the TTL is too big (as some DNS records would be too old), but I was curious nonetheless |
I am using ghcr.io/cottand/leng:sha-db020fc and don't see the metric for timeouts yet. Do I need highCardinalityEnabled true for this metric? Also side note, my ISP sucks and has high latency enough that online gaming disconnects very frequently so I wouldn't say it's too bizarre to see sudden spikes. I had them come over 7 times and got nothing out of it, I think it's an upstream issue they won't fix. Fiber is being actively installed in my town so hopefully I won't have this issue in the next year or so. Here is my config, I didn't see any other timeouts in the docs so let me know if I missed any.
|
ah I hope fiber solves your troubles! Try this in the config: tiemout=10
[Metrics]
histogramsEnabled = true Again, sorry for the lack of docs for these, I will make sure to work on that before the next release |
No worries! I hope so too haha I saw the timeout in the code afterward and added it, still got the 5 second timeout with DoH. I enabled the metric setting now so I'll see how that goes. |
I tried out blocky and have yet to experience any timeouts with my setup. The only change is literally using the blocky image / config. I am using multiple upstreams with blocky though, but I imagine clpudflare by itself for upstream dns wouldn't cause timeouts either. |
😢 sad that it was just leng timing out. If blocky is working better for you you should stick with that, but I would still love to fix this for leng. Could I ask you to try to use blocky with a single upstream and see? I also saw you forked leng to change the hardcoded 5s upstream lookup to 10s. Did this not help? |
Saw this mention in another DNS service, and thought it would be a nice addition though may also be an anti goal of keeping it simple, keeping existing support would hopefully eliminate that as a concern.
The text was updated successfully, but these errors were encountered: