feat: helm test workflow check #394

tomaszbarwicki · 2023-12-14T12:09:38Z

PR to add helm chart testing workflow check to a release-automation tool in order to verify required by TRG workflow is present in the repository.

Updates eclipse-tractusx/sig-infra#142

SebastianBezold · 2024-01-10T08:54:12Z

Hi @tomaszbarwicki,

while I think this check is actually working, I am not sure, if we should really automate this and add it to our dashboard.
I am a bit concerned, that we are "blocking" valid implementations of our TRG by limiting it to a very specific detail, like chart-testing-action in this case. We already know of teams, that are perfectly compliant with the goal of the TRG, but implemented it with different actions.

The whole automation effort is currently mainly driven by the System Team of the consortia, which we know will end at some point. We need to keep in mind, that we are potentially building a legacy for others to take over. That's why I think we should only focus on "waterproof" test, that do not require explanation or need background info on which teams have special agreements to have a failing check on our dashboard.

What do you guys think @carslen @hzierer @FaGru3n @almadigabor?

hzierer · 2024-01-10T12:45:14Z

I'm a bit torn here.
On the one hand I'm a fan of "what can be automated - should be automated", but on the other hand I second your point, too.

Maybe if we put this code into a separate workflow as example / "reference implementation"?
So people can either use Thomasz' version or do it themselves. What do you think?

tomaszbarwicki · 2024-01-10T14:13:45Z

Hi @SebastianBezold , @hzierer,

Appreciate the feedback, I understand the concern however not sure if I agree that we should resign from automation just because a margin chose to follow different approach (not saying incorrect one!). Based on my testing I made across organisation (almost 40 qualified as product's repos) there seems to be only 4 which didn't pass the automated helm workflow test.

Analysis of failing repos:

demand-capacity-mgmt - doesn't have a valid workflow(s)/other solution I could find hence not following TRG at all
puris - doesn't have a valid workflow(s)/other solution I could find hence not following TRG at all
knowledge-agents-edc - they have something but that doesn't exhaust our recommendation, personally I'd decline that implementation since it doesn't actually test the helm installation
tractusx-edc - that's the only one I could find which decided to implement it own way

In the end we weight time/effort save for each individual while release checks vs 2% of repos with potential incorrect test result. Finally why not, over time the 2% cases can be also incorporated following incremental development strategy.. :)

SebastianBezold · 2024-01-12T14:47:18Z

I am not against automation per se. But I would still argue, that the time I need to spend manually checking the workflow is actually not that much, since there are just a view key places to look at.
On the other hand, having known cases (even if only 1) that fails the test although having a valid implementation, could lead to things like the "broken window theory". The importance of passing all checks is more and more downplayed, the more often we have failing checks, that we then treat as "ah yeah in this case it is fine to be red".

Also, we do not only safe time (potentially) in checks. We also add more things, that have to be maintained. As soon as we change details on the workflow, the check has to be adapted, since it is focusing on technical details rather than the overall goal of the TRG (automatic testing of a chart).

I can definitely understand you points and as I said, I am not against automation at all. But if we automate, it has to be waterproof in my opinion. This means only reporting actual non-compliances while still be open for multiple solutions

tomaszbarwicki · 2024-01-15T10:46:59Z

I am not against automation per se. But I would still argue, that the time I need to spend manually checking the workflow is actually not that much, since there are just a view key places to look at. On the other hand, having known cases (even if only 1) that fails the test although having a valid implementation, could lead to things like the "broken window theory". The importance of passing all checks is more and more downplayed, the more often we have failing checks, that we then treat as "ah yeah in this case it is fine to be red".

Also, we do not only safe time (potentially) in checks. We also add more things, that have to be maintained. As soon as we change details on the workflow, the check has to be adapted, since it is focusing on technical details rather than the overall goal of the TRG (automatic testing of a chart).

I can definitely understand you points and as I said, I am not against automation at all. But if we automate, it has to be waterproof in my opinion. This means only reporting actual non-compliances while still be open for multiple solutions

I'm afraid the "broken window" has already been there with us for quite a while. Current base image check reports red for certain repositories despite they actually fulfill TRG but decided to use aliases. Isn't that technical detail which makes the check being not general enough? Doesn't that dictate specific format of Dockerfile? Are we really in the position that we can say all our checks are waterproof? I'm not saying we should temper aspirations, but maybe embrace step by step approach, learn from each stage and ensure sustainable progress.
Just not really clear to me why we want to be selective in this specific case.

SebastianBezold

With regards to our internal discussion about a proper exception process, I'm fine with the current implementation, since it seems to work in general

tomaszbarwicki added 17 commits December 7, 2023 10:31

chore: helm test check scaffold

7ad180d

chore: first working draft check

a755435

chore: add unit tests

3ce05c8

chore: add github workflow model

b735e6b

fix: run field yaml mapping

1f6a1b4

chore: add license headers to test files

564cb5d

feat: add helm test workflow check

82cdeea

chore: rename unit test functions

9b99227

chore: activate helm test workflow check

31652d1

chore: no charts dir no check required

ca92679

test: add one for charts dir not present

692a62b

fix: charts dir check

5662734

test: add one for gh workflows dir not present

a719291

fix: skip files with . as prefix

36ca928

chore: adjust workflow struct

f46aad6

fix: include different trigger type declarations

5482d1c

feat: add helm test workflow check to dashboard

4cef701

tomaszbarwicki marked this pull request as ready for review December 21, 2023 11:40

tomaszbarwicki mentioned this pull request Dec 21, 2023

Add gh check for TRG: "Helm charts need to run helm test action" eclipse-tractusx/sig-infra#142

Closed

SebastianBezold approved these changes Feb 8, 2024

View reviewed changes

tomaszbarwicki merged commit 7979e54 into main Feb 9, 2024
4 checks passed

tomaszbarwicki deleted the feat/helm_test branch February 9, 2024 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: helm test workflow check #394

feat: helm test workflow check #394

tomaszbarwicki commented Dec 14, 2023

SebastianBezold commented Jan 10, 2024

hzierer commented Jan 10, 2024 •

edited

Loading

tomaszbarwicki commented Jan 10, 2024

SebastianBezold commented Jan 12, 2024

tomaszbarwicki commented Jan 15, 2024

SebastianBezold left a comment

feat: helm test workflow check #394

feat: helm test workflow check #394

Conversation

tomaszbarwicki commented Dec 14, 2023

SebastianBezold commented Jan 10, 2024

hzierer commented Jan 10, 2024 • edited Loading

tomaszbarwicki commented Jan 10, 2024

SebastianBezold commented Jan 12, 2024

tomaszbarwicki commented Jan 15, 2024

SebastianBezold left a comment

Choose a reason for hiding this comment

hzierer commented Jan 10, 2024 •

edited

Loading