Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rgw]fix TFA issue by adding sleep of 20 seconds after rgw restart to avoid sync status failures #4187

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hmaheswa
Copy link

@hmaheswa hmaheswa commented Oct 25, 2024

TFA failures:
http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/8.0/rhel-9/Weekly/19.2.0-35/rgw/10/tier-2_ssl_rgw_ms_ecpool_test/
http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/8.0/rhel-9/Weekly/19.2.0-35/rgw/10/tier-2_rgw_ms-archive/
http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/8.0/rhel-9/Weekly/19.2.0-35/rgw/10/tier-2_rgw_ms-archive_resharding_granular_sync/
http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/8.0/rhel-9/Weekly/19.2.0-35/rgw/10/tier-2_rgw_ms_async_data_notification/

fail log before fix from my local:
http://magna002.ceph.redhat.com/cephci-jenkins/hsm/TFA_create_user_sync_issue/cephci-run-9JPIFU/

pass log after sleep of 20 seconds:
http://magna002.ceph.redhat.com/cephci-jenkins/hsm/TFA_create_user_sync_issue/cephci-run-96EMAC/

Description

Please include Automation development guidelines. Source of Test case - New Feature/Regression Test/Close loop of customer BZs

click to expand checklist
  • Create a test case in Polarion reviewed and approved.
  • Create a design/automation approach doc. Optional for tests with similar tests already automated.
  • Review the automation design
  • Implement the test script and perform test runs
  • Submit PR for code review and approve
  • Update Polarion Test with Automation script details and update automation fields
  • If automation is part of Close loop, update BZ flag qe-test_coverage “+” and link Polarion test

Copy link
Contributor

openshift-ci bot commented Oct 25, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hmaheswa

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

… avoid sync status failures

Signed-off-by: Hemanth Sai Maheswarla <[email protected]>
@hmaheswa hmaheswa force-pushed the TFA_fix_failed_to_retrieve_sync_info branch from de8cf1f to c1723ba Compare October 25, 2024 07:58
@hmaheswa hmaheswa requested a review from a team October 25, 2024 09:11
@hmaheswa hmaheswa added RGW Rados Gateway tfa-issue-fix TFA automation issue fix pr-verified labels Oct 25, 2024
@ckulal
Copy link
Contributor

ckulal commented Oct 28, 2024

waiting on end to end run log to merge

Comment on lines +597 to +598
log.info("sleeping for 20 seconds")
time.sleep(20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blind sleep may work today but it could fail later or in the next build.

IMHO, we could check if the rgw service is up and running using

ceph orch ls --service_type rgw --service-name <rgw_service_name> --format json |
jq '.[0].status | select(.size !=0) | .size == .running`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong with the need for checking rgw based on the heading... however the code is restarting osd_process_name

It is also possible to check for that particular daemon running status.

Copy link

This Pull request has been automatically marked as STALE due to inactivity for 15 days and will be CLOSED on further inactivity on the PR for another 15 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants