-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration tests are too slow!!! help please! #3382
Comments
I will look at SAKURACLOUD , I know it also uses the zonal diff-ing so I expect the TransIP the perform roughly simular. The provider doesn't do much anymore, so it should be easy enough to pull some performence from. There is always the chance the TransIP API is just slow as it can be, but that should be easy enough to investigate for me. |
@tlimoncelli I am working on a first improvement which could cover a ~30% improvement for CNR. I added another idea to our backlog, but that will take a while. |
I will look to disable the pager tests for HEDNS which should shave some time off that, I'm not sure there is much else I'll be able to do though as it's not an official API and in the past when I tried to contact them about adding one, I received no responses. |
Sounds great! Thank you! |
CNR -> Find the first step covered by this PR: #3391 |
I tried to draw some inspiration from SAKURACLOUD, but we kind of to exactly the same and I am unable to make it faster based on that. I do not do much in my code, so I checkt how much time I spent waiting for TransIPs API My integration test suite ran in: 519 seconds At this point I do not think I can do less API calls. I already "get the whole zone at once" and "replace the whole zone at once" I can send an email to them, but maybe @cafferata has better internal contacts? I will need to scour LinkedIn to find someone or mail a general support email. |
@tlimoncelli If you are looking for more details around telemetry, you could look into using Honeycomb to track and trace GHA buildevents: https://github.com/honeycombio/gha-buildevents cc: @lizthegrey |
@tlimoncelli we checked further possibilities for CentralNic Reseller (CNR) and rolled out some improvements in direction of http communication as part of our software dependency. We tested this update in our dnscontrol fork with no success while tests outside of DNSControl looked crazy. 100 commands sequentially requested were 4 times faster. I haven't checked the way how the integration tests work. Maybe you're recreating the provider instance per test case? If so, that might be a reason. Maybe you can provide the point of code / your insights. |
Interesting data point! Here's a bunch of random thoughts and observations: My first hunch was that the cnr provider is re-authenticating for each call. I don't think that's the problem, since with My next hunch was that cnr is being reinitialized for each test. (What you suggested above.) However I reviewed the code and integration_test.go only calls getProvider() once (I verified this by adding a Printf to getProvider(). (getProvider() initializes the provider's client.) My next hunch is that Github Actions is simply rate-limiting us, or is running on very slow VMs. However ROUTE53 runs in about 2 minutes with the same number of API calls. Could something else be doing the rate limiting? When I run CNR's integration tests from my desktop (with all VPNs and other things disabled), it runs in 4m43s, which is much faster than GHA's 6m30s. However it isn't 4 times faster, as your test observed. Are the 100 commands you sent alternating DNSZoneRRList and ModifyDNSZone? I notice that DNSZoneRRList executes very fast while there is a full 1-second pause after each ModifyDNSZone. (this is from manually observing I really appreciate all the effort you are putting into this. My hope is that we fix a performance issue that improves performance for all your customers (I'm ever the optimist!). That said, I'm happy that we're down to 6.5 minutes. Maybe we've done enough for now. Tom |
I did some more digging and found an action from Catchpoint: https://github.com/catchpoint/workflow-telemetry-action Did some quick testing on an action and looks like it might be what you are looking for. It does not look like the results are publicly available based on limited testing. The nice part is that it is pretty easy to add. All you need is:
|
Wow! That is super easy! I gave it a try in #3401 I tried it with and without the actions/cache and didn't see much of a different. (you'll see many graphs posted as comments. The last one disables actions/cache and actions/upload-artifact). TBH, I'm not sure exactly what is being cached. |
Thanks so much, we'll dig deeper whenever we have some time left. |
Hey folks!
I have a big "ask" for yall, especially the maintainers of Cloudflare, Mythic Beasts, TransIP, Azure, CNR, and HEDNS. I don't need this solved today, but please consider taking time to look into this in the next month or so.
I'm asking everyone to please please please investigate the source of slowness and try to improve the run time.
Slow integration tests == slow development. A lot of my dev time is spent waiting for the last 3-4 providers to complete their tests. If all the tests ran in under 5 minutes my productivity would go way up! I'd have more time for important improvements.
The time it takes to run all the integration tests is slowly creeping up. There are now 5 providers that take more than 10 minutes to complete the tests.
Here's the current ranking: (Slowest to fastest) (source]
I've already checked off the ones that run in less than 5 minutes. Obviously the 3-4 slowest should get the most attention.
Here's some things that have worked in the past:
Some changes that would help all providers:
Any help you can provide will be greatly appreciated! Thanks in advance!
Best,
Tom
CC: @tresni @tomfitzhenry @blackshadev @vatsalyagoel @matthewmgamble @KaiSchwarz-cnic @rblenkinsopp @riyadhalnur @TomOnTime @tlimoncelli @hnrgrgr @tresni
The text was updated successfully, but these errors were encountered: