Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated CKAN Job Error Condition ckeck-stuck-jobs #1471

Closed
btylerburton opened this issue Oct 17, 2024 · 3 comments
Closed

Automated CKAN Job Error Condition ckeck-stuck-jobs #1471

btylerburton opened this issue Oct 17, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@btylerburton
Copy link
Contributor

Workflow with Issue: 4 - Automated CKAN Jobs
Job Failed: ckan-auto-command
CKAN Command (in question): ckan geodatagov check-stuck-jobs
CKAN Command Schedule: 30 6 * * *
Cloud.gov Environment: prod

Last Commit: 3a7d473
Number of times run: 1
Last run by: btylerburton
Github Action Run: https://github.com/GSA/catalog.data.gov/actions/runs/11379675273

@btylerburton btylerburton added the bug Something isn't working label Oct 17, 2024
@btylerburton btylerburton changed the title Automated CKAN Job Error Condition Automated CKAN Job Error Condition ckeck-stuck-jobs Oct 17, 2024
@btylerburton btylerburton self-assigned this Oct 17, 2024
@btylerburton
Copy link
Contributor Author

@FuhuXia forgot to check in with you on this before EOD.

I ran the query suggested here:
https://github.com/GSA/data.gov/wiki/Stuck-Harvest-Fix#understanding-what-is-stuck

title | state | count | last_fetched | last_gathered
-----------+----------+-------+----------------------------+----------------------------
NOS NCCOS | COMPLETE | 715 | 2024-10-15 11:55:58.666356 | 2024-10-15 11:52:36.84156
NOS NCCOS | IMPORT | 1 | 2024-10-15 12:30:34.438465 | 2024-10-15 11:52:31.863972
NOS NCCOS | ERROR | 1 | 2024-10-15 12:50:20.532944 | 2024-10-15 11:52:27.639643
(3 rows)

Then the second query get harvest_job_id:
https://github.com/GSA/data.gov/wiki/Stuck-Harvest-Fix#get-harvest_job_ids-for-running-jobs

title   |  state   |            harvest_job_id            | count 

-----------+----------+--------------------------------------+-------
NOS NCCOS | COMPLETE | 1fa5f8e2-519b-41b5-9b43-5fd532de1198 | 715
NOS NCCOS | ERROR | 1fa5f8e2-519b-41b5-9b43-5fd532de1198 | 1
NOS NCCOS | IMPORT | 1fa5f8e2-519b-41b5-9b43-5fd532de1198 | 1
(3 rows)

Not sure what to do to clear out these jobs after that.

@FuhuXia
Copy link
Member

FuhuXia commented Oct 18, 2024

It was force finished by the GH actions a few hours ago since it has passed 72 hours.

I would look at the catalog-gather log to see what happens to this source and this job, and usually manually stop and job on the UI and reharvest it again to see if we can replicate it.

You can see this source was going thru some major change, all existing xml files were deleted and about same amount of new xml files was freshly added.

@FuhuXia
Copy link
Member

FuhuXia commented Oct 18, 2024

reharvesting is fine.

@FuhuXia FuhuXia closed this as completed Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🗄 Closed
Development

No branches or pull requests

2 participants