Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Actions results not appearing on the dashboard #4020

Open
gsnedders opened this issue Sep 25, 2024 · 11 comments
Open

GitHub Actions results not appearing on the dashboard #4020

gsnedders opened this issue Sep 25, 2024 · 11 comments

Comments

@gsnedders
Copy link
Member

From web-platform-tests/wpt#48223 (comment):

So:

This seems surprising that it succeeded to upload, and yet the upload doesn't appear as pending, invalid, or show up as complete.

We seem to have logging going into Google Cloud Logging, but I don't have any access to that.

cc @past @DanielRyanSmith @jcscottiii @KyleJu

@KyleJu
Copy link
Collaborator

KyleJu commented Sep 26, 2024

Checking the log, I can confirm that /api/checks/github-actions/ was called with run_id=11039052681 and returned 200. Also I see the logging

Adding %!s(*string=0xc0002990a0) for web-platform-tests/wpt run 11039052681 to upload...

from this line, but it didn't log properly:

log.Infof("Adding %s for %s/%s run %v to upload...", artifact.Name, owner, repo, runID)

It should be *artifact.Name instead.

If I have to guess, it could be an issue with the uploader. Is there a link that I can inspect the wpt results from the artifacts?

@gsnedders
Copy link
Member Author

@KyleJu https://github.com/web-platform-tests/wpt/actions/runs/11039052681 has downloads links for all the artifacts.

@gsnedders
Copy link
Member Author

I guess I'm more interested in any logging from the request we issue for https://wpt.fyi/api/results/create? Because that's presumably working too and giving 200 back.

if err != nil {
logger.Warningf("Failed to write data in api/results/create handler: %s", err.Error())
}
is one case where we could silently fail?

But it seems like it should presumably be managing to schedule the task, it's just for whatever reason it never completes.

@gsnedders
Copy link
Member Author

I guess also logging from /api/results/process?

Because something somewhere is clearly failing.

@KyleJu
Copy link
Collaborator

KyleJu commented Sep 26, 2024

Confirmed that /api/results/create has no error log around the time this run was created.

Looking at /api/checks/github-actions/ again, I noticed successful logging for 11039052681:
Screenshot 2024-09-26 11 51 44 AM

but 10 minutes before that, it also logged failures:

Screenshot 2024-09-26 11 42 42 AM

But it seems like it should presumably be managing to schedule the task, it's just for whatever reason it never completes.

I would agree with this. A job could be scheduled and returned 200, but the task might be failing in the Task Queue. There seems to be lots failures around the same timestamps:

Screenshot 2024-09-26 11 56 07 AM

@gsnedders
Copy link
Member Author

but 10 minutes before that, it also logged failures:

Can you find what run ID this was? I wonder if that's the point at which I tried to submit all the Safari runs, but I can't find any that gave 400 all the way back to the initial request, which I would've expected?

@gsnedders
Copy link
Member Author

Is there any way to find any logging from /api/results/process (i.e., from the results-processor) when the task runs? Because it's kinda surprising they're failing in the results processor.

@KyleJu
Copy link
Collaborator

KyleJu commented Sep 26, 2024

but 10 minutes before that, it also logged failures:

Can you find what run ID this was? I wonder if that's the point at which I tried to submit all the Safari runs, but I can't find any that gave 400 all the way back to the initial request, which I would've expected?

I cannot find any logging for run_id and the payload is empty as well, but the user agent of that request is curl. Maybe that rings a bell?
Screenshot 2024-09-26 4 12 32 PM

After digging through logs in results-precossor, I found these around the same timestamp (note that the severity level isn't labelled properly in processor):
Screenshot 2024-09-26 4 18 19 PM

e.g.

ERROR:processor:Failed to fetch (403): https://api.github.com/repos/web-platform-tests/wpt/actions/artifacts/1978878338/zip

in the url (https://github.com/web-platform-tests/wpt/actions/runs/11039052681) you showed me above, it should be

https://github.com/web-platform-tests/wpt/actions/runs/11039052681/artifacts/1978878338

Would that be the issue?

@gsnedders
Copy link
Member Author

It'll be the difference between what it gives as the URL via the API versus via the web interface; I guess we're getting 403 (Forbidden) because we're not passing a GitHub API token.

We'd have to notice that we're fetching api.github.com from the results-processor and then fetch the GitHub token from the datastore.

That said, the even more significant failing here is that they aren't showing up as invalid runs.

@gsnedders
Copy link
Member Author

From https://cloud.google.com/appengine/docs/standard/services/taskqueue/push/retrying-tasks?tab=python:

If a handler fails to execute a task (and therefore returns any HTTP status code outside of the range 200–299), App Engine retries the task until it succeeds.

So are we just retrying these indefinitely?

Because I don't think we're catching the error anywhere, so it should just bubble up to the Flask endpoint and then Flask will turn it into a 500 Internal Server Error.

@gsnedders
Copy link
Member Author

Reading the code, I think we're not actually retrying here:

p.download(results, screenshots, archives)
if len(p.results) == 0:
_log.error("No results successfully downloaded")
p.update_status(run_id, 'EMPTY', None, callback_url)
return ''

We should just be setting the status as EMPTY, and manually hitting https://wpt.fyi/api/status/empty confirms this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants