Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[main] sync from opendatahub-io:main #506

Open
wants to merge 60 commits into
base: main
Choose a base branch
from

Conversation

jiridanek
Copy link
Member

Description

Obsoletes

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

jiridanek and others added 30 commits January 8, 2025 17:42
…the test, which can be run with ci testing
…ter-12810696975

[Digest Updater Action] Update Notebook Images
…notebooks

Our testing framework had a fundamental issue in that it would, for certain images, naively run the test suite of parent images for a child image.  In the case of `tensorflow` images, this caused problems because the (child) `tensorflow` image would **purposefully** _downgraded_ a package version given compatability issues. Furthermore, when attempting to verify the changes to address this core issue, numerous other bugs were encountered that required to be fixed.

Significant changes introduced in this commit:
- logic of the `test-%` Makefile target was moved into a shell script named `test_jupyter_with_papermill.sh`
- test resources required to be copied to pod are now retrieved via the locally checked out repo
    - previously there were pulled from a remote branch via `wget` (defaulting to `main` branch)
    - this ensure our PR checks are now always leveraging any updated files
- test_notebook.ipynb files now expect an `expected_versions.json` file to exist within the same directory.
    - `expected_versions.json` file is derived from the relevant `...-notebook-imagestream.yaml` manifest and leveraged when asserting on dependency versions
		- admittedly the duplication of various helper functions across all the notebook files is annoying - but helps to keep the `.ipynb` files self-contained
- `...-notebook-imagestream.yaml` manifest had annotations updated to include any package that had a unit test in `test_notebook.ipynb` asserting versions
- CUDA tensorflow unit test that converts to ONNX was updated to be actually functional

Minor changes introduced in this commit:
- use `ifdef OS` in Makefile to avoid warnings about undefined variable
- all `test_notebook.ipynb` files:
    - have an `id` attribute defined in metadata
		- specify the `nbformat` as `4.5`
- the more compute-intensive notebooks had `readinessProbe` and `livenessProbe` settings updated to be less aggressive
    - was observing liveness checks sporadically failing while the notebook was being tested - and this update seemed to help
- `trustyai` notebook now runs the minimal and datascience papermill tests (similar to `tensorflow` and `pytorch`) instead of including the test code within its own `test_noteook.ipynb` file
- various "quality of life" improvements where introduced into `test_jupyter_with_papermill.sh`
    - properly invoke tests for any valid/supported target
        - previously certain test targets required manual intervention in order to run end to end
    - improved logging (particularly when running with the `-x` flag)
		- more modular structure to hopefully improve readability
		- script now honors proper shell return code semantics (i.e. returns `0` on success)

It should also be noted that while most `intel` notebooks are now passing the papermill tests - there are issues with the `intel` `tensorflow` unit tests still.  More detail is captured in the JIRA ticket related to this commit.  However, we are imminently removing `intel` logic from the `notebooks` repo entirely... so I didn't wanna burn any more time trying to get that last notebook to pass as it will be removed shortly!

Further details on `expected_versions.json`:
- `yq` (which is installed via the `Makefile` is used to:
    - query the relevant imagestream manifest to parse out the `["opendatahub.io/notebook-software"]` and `["opendatahub.io/notebook-python-dependencies"]` annotations from the first element of the `spec.tags` attribute
		- inject name/version elements for `nbdime` and `nbgitpuller` (which are asserted against in `minimal` notebook tests)
		- convert this `yaml` list to a JSON object of the form: `{<package>: <version>}`
- this JSON object is then copied into the running notebook workload in the same directly that `test_notebook.ipynb` resides
- each `test_notebook.ipynb` has a couple helper functions defined to then interact with this file:
    - `def load_expected_versions() -> dict:`
		- `def get_expected_version(dependency_name: str) -> str:
- The argument provided to the `get_expected_version` function should match the `name` attribute of the JSON structure defined in the imagestream manifest

Related-to: https://issues.redhat.com/browse/RHOAIENG-16587
RHOAIENG-16587: fix(test): ensure papermill tests run successfully for all supported notebooks
…d_cleanup

RHOAIENG-17695: chore(ci): create a test for calling `oc version` in the test, which can be run with ci testing
…n't have too many debug logs there from passing tests

This way full logs are only printed for failing tests, see https://docs.pytest.org/en/stable/how-to/logging.html
…capture

RHOAIENG-17695: chore(ci): capture logs in pytest tests so that we don't have too many debug logs there from passing tests
…ted on OCP-CI so broken intel notebooks should be deleted (opendatahub-io#843)

* RHOAIENG-8388: rm(intel): Intel tensorflow notebook failed to get tested on OCP-CI

The Intel workbench images are broken in multiple ways and since we don't ship them, let's not even carry them on the books.

* fixup, remove intel references from test_jupyter_with_papermill.sh
RHOAIENG-17305: chore(tests): IPv4 compatibility test(s), checking that we did not break the single-stack IPv4 case
…at's now causing the script to fail

```
./scripts/test_jupyter_with_papermill.sh: line 345: jupyter_ml_notebook_id: unbound variable
```
RHOAIENG-8388: rm(intel): fixup to remove one forgotten intel case that's now causing the script to fail
…r_pdf

RHOAIENG-17257: chore(tests): add allure dependency so that we can have fancy html report
This is useful in some tests when it's not necessary to actually start the IDE to test something.
…r_pdf

RHOAIENG-17257: chore(tests): make the wait for readiness optional
…r_pdf

RHOAIENG-17257: chore(tests): externalize the default workbench-starting arguments
…-updater-12886596842

[Codeflare Action] Update notebook's pipfile to sync with Codeflare-SDK release 0.26.0
…eflare/odh-sync-updater-12886596842

Update manifests after odh-sync-updater-12886596842 CodeFlare GitHub action ran
jiridanek and others added 23 commits January 24, 2025 15:36
…ally fails to start (most likely due to rootless podman) (opendatahub-io#849)

* RHOAIENG-17695: chore(ci): disable ryuk for gha runs and when it actually fails to start (most likely due to rootless podman)

* fixup, change the logic to only suggest disabling Ryuk, not disabling automatically
…-stack ipv6

This is an initial version of the test.
It introduces the TestFrame utility class into the test suite.
Implementation for macOS will come later, for now test only run on Linux.
…no_macos

RHOAIENG-18459: chore(tests/containers/workbenches): listen on single-stack IPv6
…dling: detect the right one to use if user has multiple machines around
This is to follow the proper filename convention so that the pytest will
be able to find the tests in this file accordingly.
…nes and improve debugging output when podman machine is not found (opendatahub-io#869)

* RHOAIENG-18459: chore(tests/containers): improve debugging output when podman machine is not found

* RHOAIENG-18459: chore(tests/containers): reduce the number of podman machine inspect calls, by giving it all machine names at once
Includes a simple test to check that the variables are propagated
properly so that the proxy configuration can work in RStudio IDE as
expected.

* https://issues.redhat.com/browse/RHOAIENG-17251
Let's not use trailing slash for the directory. When this variable is
used and some value should be append to it, it should be done always as:
`APP_ROOT_HOME/some-path`. So the forward slash is added there and we
avoid having double slash in our outputs that may be misleading (e.g.
empty variable value in the path).
RHOAIENG-17251: add a test for the proxy env configuration in RStudio + some other minor changes
This sanity check will report any ELF files with dynamic linking that have unsatisfied dependencies.

* apply the `"LD_LIBRARY_PATH": os.path.dirname(dlib)` fix/workaround

cuda image already has some LD_LIBRARY_PATH, this is to be preserved

```
(app-root) sh-5.1$ echo $LD_LIBRARY_PATH
/usr/local/nvidia/lib:/usr/local/nvidia/lib64
```

* use subtests to handle multiple libs not found

* skip known issue and false positives
RHOAIENG-9707: chore(tests/containers): update README.md with testcontainers motivation and lima section
…hes): tolerate IPv6 environments in codeserver, jupyterlab and rstudio"
…clean-fix-ipv6-issue

Revert "RHOAIENG-17306, RHOAIENG-17307, RHOAIENG-17308: feat(workbenches): tolerate IPv6 environments in codeserver, jupyterlab and rstudio"
RHOAIENG-9707: chore(tests/containers): check shared objects with ldd
RHOAIENG-9707: chore(tests/containers): try to install the cowsay package to check the python venv is writable
# Conflicts:
#	.github/workflows/build-notebooks.yaml
#	manifests/base/commit.env
#	manifests/base/params.env
#	poetry.lock
Copy link

openshift-ci bot commented Jan 31, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jiridanek. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jiridanek jiridanek changed the title Jd update rhds main [main] sync from opendatahub-io:main Jan 31, 2025
Copy link

openshift-ci bot commented Jan 31, 2025

@jiridanek: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images 882d492 link true /test images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants