Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI run is slowed down by failing to download fetches #855

Open
Tracked by #311
gregtatum opened this issue Sep 20, 2024 · 1 comment
Open
Tracked by #311

CI run is slowed down by failing to download fetches #855

gregtatum opened this issue Sep 20, 2024 · 1 comment
Labels
bug Something is broken or not correct cost & perf Speeding up and lowering cost for the pipeline taskcluster Issues related to the Taskcluster implementation of the training pipeline

Comments

@gregtatum
Copy link
Member

I'm working on some performance optimization to bring down our CI times, and I found some tasks are taking 12 minutes to resolve due to issues downloading fetches and artifacts. This compounds when tasks depend upon each other, so a few tasks failing in this way can increase CI runs by 20-30 minutes. After some fixes that I'm working towards merging in now, this will be the slowest part of the CI pipeline.

  • translate-mono-src-ru-en-1/2 12m 34s (failing downloads)
  • translate-mono-src-ru-en-2/2 2m 22s

And here is a profile of the task:

Here you can see that it spend 12 minutes attempting to download fetches and artifacts.

Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SVZMV_7KS3mGqDMrMgj9_A/artifacts/public/build/marian.tar.zst
Download failed: <urlopen error [Errno -3] Temporary failure in name resolution>
sleeping for 91.00s (attempt 2/5)
Download failed: <urlopen error [Errno -3] Temporary failure in name resolution>
sleeping for 90.00s (attempt 2/5)
Download failed: <urlopen error [Errno -3] Temporary failure in name resolution>
sleeping for 89.00s (attempt 2/5)
Download failed: <urlopen error [Errno -3] Temporary failure in name resolution>
sleeping for 90.00s (attempt 2/5)
attempt 3/5
@gregtatum gregtatum added bug Something is broken or not correct taskcluster Issues related to the Taskcluster implementation of the training pipeline cost & perf Speeding up and lowering cost for the pipeline labels Sep 20, 2024
@bhearsum
Copy link
Collaborator

#549 is related, possibly the same?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken or not correct cost & perf Speeding up and lowering cost for the pipeline taskcluster Issues related to the Taskcluster implementation of the training pipeline
Projects
None yet
Development

No branches or pull requests

2 participants