Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: helm test workflow check #394

Merged
merged 17 commits into from
Feb 9, 2024
Merged

feat: helm test workflow check #394

merged 17 commits into from
Feb 9, 2024

Conversation

tomaszbarwicki
Copy link
Contributor

PR to add helm chart testing workflow check to a release-automation tool in order to verify required by TRG workflow is present in the repository.

Updates eclipse-tractusx/sig-infra#142

@SebastianBezold
Copy link
Contributor

Hi @tomaszbarwicki,

while I think this check is actually working, I am not sure, if we should really automate this and add it to our dashboard.
I am a bit concerned, that we are "blocking" valid implementations of our TRG by limiting it to a very specific detail, like chart-testing-action in this case. We already know of teams, that are perfectly compliant with the goal of the TRG, but implemented it with different actions.

The whole automation effort is currently mainly driven by the System Team of the consortia, which we know will end at some point. We need to keep in mind, that we are potentially building a legacy for others to take over. That's why I think we should only focus on "waterproof" test, that do not require explanation or need background info on which teams have special agreements to have a failing check on our dashboard.

What do you guys think @carslen @hzierer @FaGru3n @almadigabor?

@hzierer
Copy link

hzierer commented Jan 10, 2024

I'm a bit torn here.
On the one hand I'm a fan of "what can be automated - should be automated", but on the other hand I second your point, too.

Maybe if we put this code into a separate workflow as example / "reference implementation"?
So people can either use Thomasz' version or do it themselves. What do you think?

@tomaszbarwicki
Copy link
Contributor Author

Hi @SebastianBezold , @hzierer,

Appreciate the feedback, I understand the concern however not sure if I agree that we should resign from automation just because a margin chose to follow different approach (not saying incorrect one!). Based on my testing I made across organisation (almost 40 qualified as product's repos) there seems to be only 4 which didn't pass the automated helm workflow test.

Analysis of failing repos:

  • demand-capacity-mgmt - doesn't have a valid workflow(s)/other solution I could find hence not following TRG at all
  • puris - doesn't have a valid workflow(s)/other solution I could find hence not following TRG at all
  • knowledge-agents-edc - they have something but that doesn't exhaust our recommendation, personally I'd decline that implementation since it doesn't actually test the helm installation
  • tractusx-edc - that's the only one I could find which decided to implement it own way

In the end we weight time/effort save for each individual while release checks vs 2% of repos with potential incorrect test result. Finally why not, over time the 2% cases can be also incorporated following incremental development strategy.. :)

@SebastianBezold
Copy link
Contributor

I am not against automation per se. But I would still argue, that the time I need to spend manually checking the workflow is actually not that much, since there are just a view key places to look at.
On the other hand, having known cases (even if only 1) that fails the test although having a valid implementation, could lead to things like the "broken window theory". The importance of passing all checks is more and more downplayed, the more often we have failing checks, that we then treat as "ah yeah in this case it is fine to be red".

Also, we do not only safe time (potentially) in checks. We also add more things, that have to be maintained. As soon as we change details on the workflow, the check has to be adapted, since it is focusing on technical details rather than the overall goal of the TRG (automatic testing of a chart).

I can definitely understand you points and as I said, I am not against automation at all. But if we automate, it has to be waterproof in my opinion. This means only reporting actual non-compliances while still be open for multiple solutions

@tomaszbarwicki
Copy link
Contributor Author

I am not against automation per se. But I would still argue, that the time I need to spend manually checking the workflow is actually not that much, since there are just a view key places to look at. On the other hand, having known cases (even if only 1) that fails the test although having a valid implementation, could lead to things like the "broken window theory". The importance of passing all checks is more and more downplayed, the more often we have failing checks, that we then treat as "ah yeah in this case it is fine to be red".

Also, we do not only safe time (potentially) in checks. We also add more things, that have to be maintained. As soon as we change details on the workflow, the check has to be adapted, since it is focusing on technical details rather than the overall goal of the TRG (automatic testing of a chart).

I can definitely understand you points and as I said, I am not against automation at all. But if we automate, it has to be waterproof in my opinion. This means only reporting actual non-compliances while still be open for multiple solutions

I'm afraid the "broken window" has already been there with us for quite a while. Current base image check reports red for certain repositories despite they actually fulfill TRG but decided to use aliases. Isn't that technical detail which makes the check being not general enough? Doesn't that dictate specific format of Dockerfile? Are we really in the position that we can say all our checks are waterproof? I'm not saying we should temper aspirations, but maybe embrace step by step approach, learn from each stage and ensure sustainable progress.
Just not really clear to me why we want to be selective in this specific case.

Copy link
Contributor

@SebastianBezold SebastianBezold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regards to our internal discussion about a proper exception process, I'm fine with the current implementation, since it seems to work in general

@tomaszbarwicki tomaszbarwicki merged commit 7979e54 into main Feb 9, 2024
4 checks passed
@tomaszbarwicki tomaszbarwicki deleted the feat/helm_test branch February 9, 2024 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants