Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLE-Micro Container Host for ppc64le - fixes #20383

Closed
wants to merge 1 commit into from

Conversation

m-dati
Copy link
Contributor

@m-dati m-dati commented Oct 11, 2024

Description:

  • SLE-micro case added in test_opensuse_based_image;
  • utils:ensure_ca_certificates missing CA uri-repos for sle-micro, management fixed;
  • toolbox: when explicit uri needed, in place of CONTAINER_IMAGE_TO_TEST, introduced CONTAINER_IMAGE_TO_TEST_EXTRA, separation from BCI needed for that logic, otherwise causing next get_image_uri calls in SLE-M tests to override CONTAINER_IMAGE_VERSIONS.
  • image::get_image_uri call, distri parameter defined;
  • zypper_call: add retry input variable, to change the fixed 5 loops.

References:

Copy link

Great PR! Please pay attention to the following items before merging:

Files matching lib/**.pm:

  • Consider adding or extending unit tests in t/

This is an automatically generated QA checklist based on modified files.

@m-dati m-dati self-assigned this Oct 11, 2024
@m-dati m-dati marked this pull request as draft October 11, 2024 13:36
@m-dati m-dati added the WIP Work in progress label Oct 11, 2024
@m-dati
Copy link
Contributor Author

m-dati commented Oct 11, 2024

If more commits or PR split needed, I'm open to suggestions.

@m-dati m-dati requested a review from jlausuch October 11, 2024 13:40
@@ -81,7 +81,7 @@ sub run {
# We may test either one specific image VERSION or comma-separated CONTAINER_IMAGE_VERSIONS
my $versions = get_var('CONTAINER_IMAGE_VERSIONS', get_required_var('VERSION'));
for my $version (split(/,/, $versions)) {
my $image = get_image_uri(version => $version);
my $image = get_image_uri(version => $version, distri => ($version =~ /-SP/ ? 'sle' : undef()));

Copy link
Contributor Author

@m-dati m-dati Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update is needed when HDD DISTRI is different from containers distri in CONTAINER_IMAGE_VERSIONS, like in this SLE-M cases. That routine by default set test's DISTRI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sle 16 does not have any SP that might be a future potential problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops, I was not aware of such changes. But, this applies to CONTAINER_IMAGE_VERSIONS items only, like 15-SP2,15-SP3,15-SP4,...: how should we expect that list with 16?

Copy link
Contributor Author

@m-dati m-dati Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But eventually in that case, we could add a more condition, like : ($version =~ /-SP|16/ ? ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see this is the only code change that is actually fixing something, I would make this alone a PR and split the other changes also into different PRs.

Copy link
Contributor

@grisu48 grisu48 Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this fixing something? I don't understand the impact of this change or where the original problem was. 🤔

Also, keep in mind that we also need to run this on openSUSE, so this might break stuff over there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and yes: it is TL;DR to explain, but update is tricky and needed, due to the BCI Container test flows

lib/utils.pm Outdated
@@ -588,6 +588,7 @@ sub zypper_call {
my $timeout = $args{timeout} || 700;
my $log = $args{log};
my $dumb_term = $args{dumb_term} // is_serial_terminal;
my $retry = $args{retry} || 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my $retry = $args{retry} // 5; As 0 is a valid value IMO in case you simply want to have 0 retries.

Copy link
Contributor Author

@m-dati m-dati Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 reties means No-run at all, being the command for 1 .. 0: isn't it?
ok, the symbol // is used when undefined, so when $retry null, here 5 assigned.
And Yes, I should check the input and limit values never 0: from 1 to >1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the loop type is not correct. It should always run at least once. If there is undef the retry will be the default value. In this implementation we cannot disable retry.

Copy link
Contributor Author

@m-dati m-dati Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the previous old existing setting as default:
if $retry is empty/undef, then 5 assigned, as before it worked.

But also retry=0, here should becomes 5, as correction: I could check 0 and set =1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In last update: undef=5; assigned but '' or 0=1; >0=$retry.

Copy link
Contributor

@mloviska mloviska Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this would simplify the code and it would be also cleaner, straightforward and maybe more safe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a matter of definitions: retries after first attempt failed or total runs while failing (we could rename variable).
In this implementation: 1 means 1 run only total (no retry); 2 means 1 retry if failed or 2 total runs whil failing, etc...
To well-use our routines we have the heading routine descipition: the user should clarify all logic and usage there: I will update it too, when behavior clarified.

Copy link
Contributor

@mloviska mloviska Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that this PR is diverging into all possible spaces at once. I would be happy to see that zypper_call is refactored/updated/you name it in a separate PR, slow track that all users can drop their insights and thoughts to make the code better. I do not see any strong relation between ppc and zypper_call that requires attention at the moment. It would be beneficial for everyone if zypper_call improvements would be separated and that you can improve that code (which would be welcomed) in a different PR :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I already mentioned it before in #20383 (comment).
A weak correlation and motivation that led me to this change is the introduction of sure failing check on sle-micro CA check: original loop is 5 retry, I attempting to reduce it to 3 at most.
But if it is a problem, I will revert this change and move it to a different next PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that cert URL seems to be wrongly composed. If that is fixed, all the stuff should run smoothly as it runs well for other archs

lib/utils.pm Outdated Show resolved Hide resolved
@@ -60,7 +60,7 @@ sub run {
select_console 'root-console';
$self->create_user;

my $toolbox_image_to_test = get_var('CONTAINER_IMAGE_TO_TEST');
my $toolbox_image_to_test = get_var('CONTAINER_IMAGE_TO_TEST_EXTRA');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this got renamed?

Copy link
Contributor Author

@m-dati m-dati Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that, in this module for toolbox tests, when needed to update the default uri, we use that same parameter CONTAINER_IMAGE_TO_TEST, also used in other containers modules where logic is different, like image_podman, where failed in a VR.

In fact, in the the loop above, tests/containers/image.pm:L#84, calling get_image_uri, it will soon return that assigned url, instead of continuing on distri/version found in CONTAINER_IMAGE_VERSIONS.

That toolbox url here is not what we expect in containers tests, therfore we should let the original parameter for this logic, and use a new one for single separated images checks.

CONTAINER_IMAGE_TO_TEST also used in bci modules and main containers modules and logic is different than a single image check, like toolbox.

I think that also in other cases similar to toolbox this new parameter could be used, without any impact [i.e.install_dhcp_container? 🤔].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VR 1 above:
Here in toolbox we use ..._EXTRA;
here in image_podman, using the standard parameter, the loop can inspect the containers in CONTAINER_IMAGE_VERSIONS: but still we can set eventual ...TO_TEST for that logic, nothing to do with toolbox.

This happens in particular when HDD_ and elements in CONTAINER_IMAGE_VERSIONS have diffrent distros, like for SLE-M and SLES.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable has been context-sensitive (i.e. test-run sensitive) since years and it worked just fine. The name CONTAINER_IMAGE_TO_TEST is descriptive to what it is, it defines the container under test.

We also don't use different ISO or HDD_ variables depending on the test run, this is a more generic setting that served us well in the past.


When changing something this fundamental it needs to be well motivated. So please @m-dati, either make your case or please let's not rename something that is used all over the place since a long time without the need for doing so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, before read carfully the description provided in #20383 (comment).

I am not renaming anything, but using another parameter to address the local toolbox need. The original parameter CONTAINER_IMAGE_TO_TEST scope and usage reamain the same.

I simply realized that the image explicit definition in this toolbox module also affecting another next module, where the original CONTAINER_IMAGE_TO_TEST used for a totally different meaning and scope.

Therefore, this means that this for toolbox is a totally different parameter.
I named it similar to the original simply adding a variation (_EXTRA), to mantain its origin, but to generalize it for further usage, but we can opt for other naming, like e.g. TOOLBOX_IMAGE_TO_TEST

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This point requires a discussion, let's reserve a time slot on Thursday for it.

lib/utils.pm Outdated Show resolved Hide resolved
lib/utils.pm Outdated Show resolved Hide resolved
@m-dati
Copy link
Contributor Author

m-dati commented Oct 11, 2024

Here are other VRs after last update:

Non regression:

lib/utils.pm Outdated
}
}
return $ret;
Copy link
Contributor

@mloviska mloviska Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether returning the caller an int makes it any better. By the name the function should either succeed or die.

Copy link
Contributor Author

@m-dati m-dati Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but let we consider some new situation raised, due to IBS CA, where NO sle-micro present: because of that, I introduced the out : true for SLE(S)/OpenSUSE OR fail (SLE-Micro or else) and result became a 4 way output.
IMHO the decision could be passed from caller by a more option continue_on_fail or die and act at end routine based on out:
wdyt ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until 6.0, sle-micro versions were mapped to sle, which is the previous code reflecting. The question is how we should continue over here and that is a question on PjM rather than any QA engineer can answer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I agree that all installation commands should stop run on fail: updating code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check last commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have found the cert folder, or at least it might be worth to be tested /SLE-Factory/ in case of slem 6.0+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already noted it, but I had no info on if-and-when it can be used, in place of which case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already noted it, but I had no info on if-and-when it can be used, in place of which case.

First of all I would try to test it, then you can come to the project channel with direct questions. We are not building the assets so we cannot know :)

  utils:ensure_ca_certificate missing CA repos in sle-m fixed

  image::get_image_uri call, distri parameter defined

  toolbox: renamed CONTAINER_IMAGE_TO_TEST to CONTAINER_IMAGE_TO_TEST_EXTRA
    to separate logic from BCI, impacting get_image_uri in SLE-M tests

  zypper_call: add retry input variable, to change the fixed 5 loops.
Copy link
Contributor

@grisu48 grisu48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see several problems with this PR:

  1. For a fixing PR it does too many improvements. A fixing PR should only do the required fixes
  2. The PR touches some problematic places, that need to have some discussions. I would split them into separate PR to keep them focused
  3. It does too many things at once. I would recommend to split every logical piece into an individual PR, which makes the review easier and doesn't block the whole PR but only sub-pieces.

@m-dati I would recommend to split the individual pieces into different PRs and proceed there. This PR will be difficult to merge, because it does too many things at once and some of them require discussions.

@@ -182,6 +182,8 @@ sub test_opensuse_based_image {
} else {
record_info "non-SLE host", "This host ($host_id) does not support zypper service";
}
} elsif ($image_id =~ /^(sl|sle)-micro$/) {
validate_script_output qq{$runtime container run --entrypoint '/bin/bash' --rm $image -c 'cat /etc/os-release'}, sub { /PRETTY_NAME="SUSE Linux Micro .*"/ };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check makes no sense in test_opensuse_based_image because the image is supposed to be openSUSE based and not SLE-Micro based.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why is this check added in a fixing PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used also for fo OSD tests here and it impacted the new SLE-M tests: this update is needed for the actual structure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openSUSE based images cannot contain SLE-Micro in their /etc/os-release. If this is the case, the test design is broken and then we need to fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This routine is used ALSO for SLES, as by the code if ($image_id =~ 'sles')...; therefore, is not that strange adding here SLE-Micro too. Its name is misleading

@@ -588,7 +588,8 @@ sub zypper_call {
my $timeout = $args{timeout} || 700;
my $log = $args{log};
my $dumb_term = $args{dumb_term} // is_serial_terminal;

# (retry >=0 or ='': unchanged; if undef: 5) would first term be '' or 0: 1
my $retry = ($args{retry} // 5) or 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to make this configurable now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why do we make this change in a fixing PR?


# exit codes defined will not stop on error.
my $out = zypper_call("ar --refresh http://download.suse.de/ibs/SUSE:/CA/SLE_$distversion/SUSE:CA.repo", retry => 3, exitcode => [0, 1, 2, 3, 4]);
if ($out == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like it that we're trying to workaround infra issues here. This makes the test flow unpredictable.

Can you justify why this is needed? We use this all over the place and I'm not aware on infra issues on such a scale that would justify this addition.

@@ -60,7 +60,7 @@ sub run {
select_console 'root-console';
$self->create_user;

my $toolbox_image_to_test = get_var('CONTAINER_IMAGE_TO_TEST');
my $toolbox_image_to_test = get_var('CONTAINER_IMAGE_TO_TEST_EXTRA');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable has been context-sensitive (i.e. test-run sensitive) since years and it worked just fine. The name CONTAINER_IMAGE_TO_TEST is descriptive to what it is, it defines the container under test.

We also don't use different ISO or HDD_ variables depending on the test run, this is a more generic setting that served us well in the past.


When changing something this fundamental it needs to be well motivated. So please @m-dati, either make your case or please let's not rename something that is used all over the place since a long time without the need for doing so.

@@ -81,7 +81,7 @@ sub run {
# We may test either one specific image VERSION or comma-separated CONTAINER_IMAGE_VERSIONS
my $versions = get_var('CONTAINER_IMAGE_VERSIONS', get_required_var('VERSION'));
for my $version (split(/,/, $versions)) {
my $image = get_image_uri(version => $version);
my $image = get_image_uri(version => $version, distri => ($version =~ /-SP/ ? 'sle' : undef()));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see this is the only code change that is actually fixing something, I would make this alone a PR and split the other changes also into different PRs.

@m-dati
Copy link
Contributor Author

m-dati commented Oct 15, 2024

Third issue I encountered when I added ppc64le for SLE-M 6.1 in this poo's test group, after (1) boot resolved then (2) default toolbox version old sle-m 5.5, was related to (3) missing certificates in the new image, preventing to pull from https registry, in podman tests.
But then (4th issue) also resulted missing on IBS the SLE-micro certificate folder.
Therefore I updated the code, to get it from ca.suse.de, when no zypper repo found.

Now, after some discussion on it, given last concepts and suggestions, I will provide a simpler update using as fall-back the SLE_Factory/.

I go splitting main changes of this PR in different sub-PRs.

@m-dati m-dati removed the WIP Work in progress label Oct 22, 2024
@m-dati
Copy link
Contributor Author

m-dati commented Oct 22, 2024

In other PR-s some more discussions addressed actual test flow status and needed changes, then applied.
For the related p.o.o. ticket the work is done, then this PR can be closed.

@m-dati m-dati closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants