-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rgw]fix TFA issue by adding sleep of 20 seconds after rgw restart to avoid sync status failures #4187
base: master
Are you sure you want to change the base?
[rgw]fix TFA issue by adding sleep of 20 seconds after rgw restart to avoid sync status failures #4187
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hmaheswa The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
… avoid sync status failures Signed-off-by: Hemanth Sai Maheswarla <[email protected]>
de8cf1f
to
c1723ba
Compare
waiting on end to end run log to merge |
log.info("sleeping for 20 seconds") | ||
time.sleep(20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blind sleep may work today but it could fail later or in the next build.
IMHO, we could check if the rgw service is up and running using
ceph orch ls --service_type rgw --service-name <rgw_service_name> --format json |
jq '.[0].status | select(.size !=0) | .size == .running`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong with the need for checking rgw based on the heading... however the code is restarting osd_process_name
It is also possible to check for that particular daemon running status.
This Pull request has been automatically marked as STALE due to inactivity for 15 days and will be CLOSED on further inactivity on the PR for another 15 days. |
TFA failures:
http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/8.0/rhel-9/Weekly/19.2.0-35/rgw/10/tier-2_ssl_rgw_ms_ecpool_test/
http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/8.0/rhel-9/Weekly/19.2.0-35/rgw/10/tier-2_rgw_ms-archive/
http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/8.0/rhel-9/Weekly/19.2.0-35/rgw/10/tier-2_rgw_ms-archive_resharding_granular_sync/
http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/8.0/rhel-9/Weekly/19.2.0-35/rgw/10/tier-2_rgw_ms_async_data_notification/
fail log before fix from my local:
http://magna002.ceph.redhat.com/cephci-jenkins/hsm/TFA_create_user_sync_issue/cephci-run-9JPIFU/
pass log after sleep of 20 seconds:
http://magna002.ceph.redhat.com/cephci-jenkins/hsm/TFA_create_user_sync_issue/cephci-run-96EMAC/
Description
Please include Automation development guidelines. Source of Test case - New Feature/Regression Test/Close loop of customer BZs
click to expand checklist