Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Ensure Github workflow runs on docker image used by Production Distribution Build #3966

Closed
peterzhuamazon opened this issue Sep 6, 2023 · 18 comments
Assignees
Labels
campaign Parent issues of OpenSearch release campaigns. cicd docker enhancement New Enhancement jenkins Jenkins related issue Meta

Comments

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Sep 6, 2023

[Meta] Ensure Github workflow pull docker image used by Infra Jenkins build.

This is to ensure every plugin repo will use the exact docker images we used in Jenkins build, to check their PRs and run tests before merging the code, so that issues can be detected earlier, and environment can be identical across teams.

This is to make sure, integration tests are run on same environment as that of infra build and help catch issues early.

Also this requires the user of the run being changed to root then switch to 1000 so both OS and OSD can start running without issues on linux: https://docs.github.com/en/actions/creating-actions/dockerfile-support-for-github-actions#user

As of now, github only supports linux docker at the moment.

Error: Container operations are only supported on Linux runners

We have a script to help easily retrieve the latest version of docker images Infra uses in Jenkins build:

20231006 We have sent out the sub-issues to all the existing plugin repos, and update the onboarding doc with this step.

20231011 Behavior has changed as I am currently working with Alerting team on improving the sourcing of the script and common workflows:

Additional Issues:

Exceptions:

  • PA / RCA have issues due to inter-dependencies of each other, trouble checking out on the infra build image, and requires docker/docker-compose to complete the build tests
  • PA Commons / i18n does not have existing CI yamls, or is not part of the bundled artifacts
  • Dashboards notifications has issues start cypress test on the rockylinux8 images. Resolved with another setups to only run yarn.
  • Security repo is too complicated to onboard this as they have wrappers around wrappers so need their teams pointers on this.

Thanks.

@github-actions github-actions bot added the untriaged Issues that have not yet been triaged label Sep 6, 2023
@peterzhuamazon peterzhuamazon self-assigned this Sep 6, 2023
@peterzhuamazon peterzhuamazon added enhancement New Enhancement Meta cicd docker jenkins Jenkins related issue campaign Parent issues of OpenSearch release campaigns. and removed untriaged Issues that have not yet been triaged labels Sep 6, 2023
@peterzhuamazon peterzhuamazon moved this from Backlog to In Progress in OpenSearch Engineering Effectiveness Sep 6, 2023
@peterzhuamazon peterzhuamazon changed the title [Meta] Ensure Github workflow pull docker image used by Infra Jenkins build [Meta] Ensure Github workflow runs on docker image used by Production Distribution Build' Oct 6, 2023
@peterzhuamazon peterzhuamazon changed the title [Meta] Ensure Github workflow runs on docker image used by Production Distribution Build' [Meta] Ensure Github workflow runs on docker image used by Production Distribution Build Oct 6, 2023
This was referenced Oct 6, 2023
@kavilla
Copy link
Member

kavilla commented Oct 7, 2023

One thing I noticed when attempting this in the past was that user that GitHub runner has and will start every run does not have permissions to certain dependencies pre-installed.

Then attempt to install it's own version will cause errors like conflicts. Do we now build in the user 1001 to the Docker image?

@peterzhuamazon
Copy link
Member Author

@kavilla in the example above if you expand the entire workflow file, I have a method to use root to start the container, then switch back to 1000 user with all dir change owner to 1000 beforehand, to avoid issue.

Thanks.

@gaiksaya
Copy link
Member

gaiksaya commented Oct 12, 2023

@peterzhuamazon @rishabh6788 @prudhvigodithi I think the ci images used in all repos should be the one we use to build the distribution of that version right? Wondering if just getting the highest there is is gonna help here and might not be true all the time?
https://github.com/opensearch-project/opensearch-build/blob/main/docker/ci/get-ci-images.sh#L72
For example: If 3.x is the version of the branch, it should go and grab 3.0.0/opensearch-3.0.0.yml and grab the ci image from there. This will reduce the edge cases and me precise the error faced by them are exactly the same as we face using the distribution build. Same applies to integration and bwc tests

@peterzhuamazon
Copy link
Member Author

We have a new approach established here.
Please see this sample workflow file from Alerting for example:
https://github.com/opensearch-project/alerting/blob/main/.github/workflows/multi-node-test-workflow.yml

Thanks.

@peterzhuamazon
Copy link
Member Author

@peterzhuamazon @rishabh6788 @prudhvigodithi I think the ci images used in all repos should be the one we use to build the distribution of that version right? Wondering if just getting the highest there is is gonna help here and might not be true all the time? https://github.com/opensearch-project/opensearch-build/blob/main/docker/ci/get-ci-images.sh#L72 For example: If 3.x is the version of the branch, it should go and grab 3.0.0/opensearch-3.0.0.yml and grab the ci image from there. This will reduce the edge cases and me precise the error faced by them are exactly the same as we face using the distribution build. Same applies to integration and bwc tests

That is ok, as now all the repos will source from our get-ci-image-tag.yml, in which it utilizes the get-ci-images.sh. We are the single source of truth here and we can tweak it based on situation.

The other repos will just source from us so it will not diverse over time.

Thanks.

@gaiksaya
Copy link
Member

@peterzhuamazon @rishabh6788 @prudhvigodithi I think the ci images used in all repos should be the one we use to build the distribution of that version right? Wondering if just getting the highest there is is gonna help here and might not be true all the time? https://github.com/opensearch-project/opensearch-build/blob/main/docker/ci/get-ci-images.sh#L72 For example: If 3.x is the version of the branch, it should go and grab 3.0.0/opensearch-3.0.0.yml and grab the ci image from there. This will reduce the edge cases and me precise the error faced by them are exactly the same as we face using the distribution build. Same applies to integration and bwc tests

That is ok, as now all the repos will source from our get-ci-image-tag.yml, in which it utilizes the get-ci-images.sh. We are the single source of truth here and we can tweak it based on situation.

The other repos will just source from us so it will not diverse over time.

Thanks.

The source of truth is what will cause issues I believe. For example, for 3.x if we have completed revamped image and 2.x uses old image or even between releases if change image, manifest is the only source of truth. I think parsing a yaml (manifest in this case) is more legit. Based on products we have different manifests too. Same thing applies to integration tests yaml file.

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented Oct 13, 2023

@peterzhuamazon @rishabh6788 @prudhvigodithi I think the ci images used in all repos should be the one we use to build the distribution of that version right? Wondering if just getting the highest there is is gonna help here and might not be true all the time? https://github.com/opensearch-project/opensearch-build/blob/main/docker/ci/get-ci-images.sh#L72 For example: If 3.x is the version of the branch, it should go and grab 3.0.0/opensearch-3.0.0.yml and grab the ci image from there. This will reduce the edge cases and me precise the error faced by them are exactly the same as we face using the distribution build. Same applies to integration and bwc tests

That is ok, as now all the repos will source from our get-ci-image-tag.yml, in which it utilizes the get-ci-images.sh. We are the single source of truth here and we can tweak it based on situation.
The other repos will just source from us so it will not diverse over time.
Thanks.

The source of truth is what will cause issues I believe. For example, for 3.x if we have completed revamped image and 2.x uses old image or even between releases if change image, manifest is the only source of truth. I think parsing a yaml (manifest in this case) is more legit. Based on products we have different manifests too. Same thing applies to integration tests yaml file.

Like I said it is a workflow yml file, we can code it whatever we think it is suitable, either for the later branching strategy or others. We can wrap this as abstractions while others can source accordingly.

The other repos can use @branch to decide which one to pull from, which would fit us well if we branch ourselves.

Again, the point is making everyone sourcing from us, then we can make changes accordingly as we are the owner of the images after all.

@gaiksaya
Copy link
Member

Why not do it right from the start than waiting on later when we enable branching. Also branching has nothing to do with what I trying to say. A particular manifest file(s) need to be parsed. Branch does not matter just the manifest files. Branching is more for our code base than any external repos. Let me know if I need to elaborate more.

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented Oct 16, 2023

Why not do it right from the start than waiting on later when we enable branching. Also branching has nothing to do with what I trying to say. A particular manifest file(s) need to be parsed. Branch does not matter just the manifest files. Branching is more for our code base than any external repos. Let me know if I need to elaborate more.

I am not sure what that means as I am already leaving room for the upcoming branching so it will not be hardcoded from now on.

@gaiksaya
Copy link
Member

gaiksaya commented Oct 16, 2023

Let me give an example:

For main focusing on 3.0.0 for all or most plugins, the CI image for building can be grabbed from 3.0.0/opensearch-3.0.0.yml and 3.0.0/opensearch-dashboards-3.0.0.yml This has nothing to do with branching. Manifests are the source of truth for building and testing. So instead of grabbing the top most ci images from ecr/dockerhub grab it from the manifest as that is what will be used.

Similar for 2.x branch, all the components can grab from related 2.x.x/*-2.x.x.yml manifests.
Adding @bbarani @prudhvigodithi @dblock to see if this is making sense.

@peterzhuamazon
Copy link
Member Author

Let me give an example:

For main focusing on 3.0.0 for all or most plugins, the CI image for building can be grabbed from 3.0.0/opensearch-3.0.0.yml and 3.0.0/opensearch-dashboards-3.0.0.yml This has nothing to do with branching. Manifests are the source of truth for building and testing. So instead of grabbing the top most ci images from ecr/dockerhub grab it from the manifest as that is what will be used.

Similar for 2.x branch, all the components can grab from related 2.x.x/*-2.x.x.yml manifests. Adding @bbarani @prudhvigodithi to see if this is making sense.

Like I said before if you want to make such changes, you can as we are the single source of truth.
The current implementation is sourcing from ECR, and we can definitely change it later.
But that doesnt stop the process of making changes to other plugins repos to source from us.
As that has nothing to do with how we retrieve the image information.

@gaiksaya
Copy link
Member

gaiksaya commented Oct 16, 2023

Grabbing the right version using their build tools (gradle or yarn) in order to get the right manifest is something that would need changes to all repos workflows again. Hence, suggesting to finalize a approach before moving forward.

@peterzhuamazon
Copy link
Member Author

Grabbing the right version using their build tools (gradle or yarn) in order to get the right manifest is something that would need changes to all repos workflows again. Hence, suggesting to finalize a approach before moving forward.

I didnt use their tool to grab the manifest.
I am not sure what you mean by that.

Their github workflow will run a workflow file provided by our repo, to retrieve the docker image and this has nothing to do with their gradle or yarn build tools.

@gaiksaya
Copy link
Member

image

Hopefully this image represents what I am trying to say.

@peterzhuamazon
Copy link
Member Author

Plugin github actions -> (can send version) -> build repo workflow -> build repo script -> return image -> run build.

I think yours is fine but like I said, those are the details internally on how our script would be implemented, on the high level you dont need to expose such detailed changes to plugin repos, you can just hide them inside the build repo.

My approach allows for the minimal changes from plugin repos, and that is the point I am making here.

I would argue your approach is good but over time I think it is going to diverse the code across each repos on how they retrieve the image.

Remember the detailed implementation on how to retrieve the image, either from ECR or from manifest, can be changed easily later within build repo. My implementation now is to have a foundation for the repositories to be able to make a call as simple as few lines, so they dont need to worry about the details.

Thanks.

@prudhvigodithi
Copy link
Member

Hey @peterzhuamazon thanks for the solution this should remove last minute release build/test issues as the testing is done on the same CI image that are used in distribution build.

I tried to implement this for job-scheduler, following are the concerns I have.

  • The get-ci-images.sh always retrieves the latest (version) tag, what if we dint update the version in input manifest to that version? or what if we want to build and test on the older version in input manifest ? since the plugin workflow always gets the latest version tag, would it be a problem if we want to build and test with the older CI version image?

  • The get-ci-image-tag.yml is hardcoded to fetch from main, what if the build repo is branched? then should we re-update the plugin workflows to change the branch ?

@gaiksaya idea is more generic. Even for the version increment workflows we get the version from the plugin repo and increment, same way we get the version from plugin repo, use the respective version manifest from build repo and use the CI image from the input manifest.

@peterzhuamazon
Copy link
Member Author

Hi @prudhvigodithi as I mentioned before that script can be changed internally while maintain the external parameters, we can always tweak that script function at any time.

As for the yaml I already have a input field for people to change the branch called:
https://github.com/opensearch-project/opensearch-build/blob/main/.github/workflows/get-ci-image-tag.yml#L13-L16

We can talk more on this later.
Thanks.

@peterzhuamazon
Copy link
Member Author

We will close this issue once all the open PRs are closed.
With a few exceptions have reasonings listed in their separate tickets.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
campaign Parent issues of OpenSearch release campaigns. cicd docker enhancement New Enhancement jenkins Jenkins related issue Meta
Projects
Development

No branches or pull requests

4 participants