Skip to content

Commit

Permalink
Merge branch 'ray-project:master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
dimakis authored Oct 20, 2023
2 parents 926fd4f + 6f9389a commit 33566e7
Show file tree
Hide file tree
Showing 237 changed files with 58,336 additions and 43,557 deletions.
26 changes: 12 additions & 14 deletions .buildkite/test-sample-yamls.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@
- IMG=kuberay/operator:nightly make docker-image
- popd
# Use nightly KubeRay operator image
- source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.6.3 OPERATOR_IMAGE=kuberay/operator:nightly python3 tests/test_sample_raycluster_yamls.py
- source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.7.0 OPERATOR_IMAGE=kuberay/operator:nightly python3 tests/test_sample_raycluster_yamls.py

- label: 'Test RayCluster Sample YAMLs (latest release)'
instance_size: large
image: golang:1.19
commands:
- ./.buildkite/setup-env.sh
# Use KubeRay operator image from the latest release
- source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.6.3 OPERATOR_IMAGE=kuberay/operator:v0.6.0 python3 tests/test_sample_raycluster_yamls.py
- source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.7.0 OPERATOR_IMAGE=kuberay/operator:v1.0.0-rc.1 python3 tests/test_sample_raycluster_yamls.py

- label: 'Test RayJob Sample YAMLs (nightly operator)'
instance_size: large
Expand All @@ -30,17 +30,15 @@
- IMG=kuberay/operator:nightly make docker-image
- popd
# Use nightly KubeRay operator image
- source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.6.3 OPERATOR_IMAGE=kuberay/operator:nightly python3 tests/test_sample_rayjob_yamls.py
- source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.7.0 OPERATOR_IMAGE=kuberay/operator:nightly python3 tests/test_sample_rayjob_yamls.py

# Temporarily skip due to adding new `RuntimeEnvYAML` field in sample YAMLs.
# TODO(architkulkarni): Reenable after 1.0 release
# - label: 'Test RayJob Sample YAMLs (latest release)'
# instance_size: large
# image: golang:1.19
# commands:
# - ./.buildkite/setup-env.sh
# # Use KubeRay operator image from the latest release
# - source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.6.3 OPERATOR_IMAGE=kuberay/operator:v0.6.0 python3 tests/test_sample_rayjob_yamls.py
- label: 'Test RayJob Sample YAMLs (latest release)'
instance_size: large
image: golang:1.19
commands:
- ./.buildkite/setup-env.sh
# Use KubeRay operator image from the latest release
- source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.7.0 OPERATOR_IMAGE=kuberay/operator:v1.0.0-rc.1 python3 tests/test_sample_rayjob_yamls.py

- label: 'Test RayService Sample YAMLs (nightly operator)'
instance_size: large
Expand All @@ -52,12 +50,12 @@
- IMG=kuberay/operator:nightly make docker-image
- popd
# Use nightly KubeRay operator image
- source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.6.3 OPERATOR_IMAGE=kuberay/operator:nightly python3 tests/test_sample_rayservice_yamls.py
- source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.7.0 OPERATOR_IMAGE=kuberay/operator:nightly python3 tests/test_sample_rayservice_yamls.py

- label: 'Test RayService Sample YAMLs (latest release)'
instance_size: large
image: golang:1.19
commands:
- ./.buildkite/setup-env.sh
# Use KubeRay operator image from the latest release
- source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.6.3 OPERATOR_IMAGE=kuberay/operator:v0.6.0 python3 tests/test_sample_rayservice_yamls.py
- source .venv/bin/activate && BUILDKITE_ENV=true RAY_IMAGE=rayproject/ray:2.7.0 OPERATOR_IMAGE=kuberay/operator:v1.0.0-rc.1 python3 tests/test_sample_rayservice_yamls.py
2 changes: 1 addition & 1 deletion .github/workflows/image-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:
description: 'Commit reference (branch or SHA) from which to build the images.'
required: true
tag:
description: 'Desired release version tag (e.g. v0.6.0-rc.0).'
description: 'Desired release version tag (e.g. v1.0.0-rc.1).'
required: true

jobs:
Expand Down
22 changes: 21 additions & 1 deletion .github/workflows/test-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -346,7 +346,7 @@ jobs:
- uses: ./.github/workflows/actions/compatibility
with:
ray_version: 2.5.0

test-compatibility-2_6_3:
needs:
- build_operator
Expand All @@ -367,6 +367,26 @@ jobs:
with:
ray_version: 2.6.3

test-compatibility-2_7_0:
needs:
- build_operator
- build_apiserver
- lint
runs-on: ubuntu-latest
name: Compatibility Test - 2.7.0
steps:
- name: Check out code into the Go module directory
uses: actions/checkout@v2
with:
# When checking out the repository that
# triggered a workflow, this defaults to the reference or SHA for that event.
# Default value should work for both pull_request and merge(push) event.
ref: ${{github.event.pull_request.head.sha}}

- uses: ./.github/workflows/actions/compatibility
with:
ray_version: 2.7.0

test-compatibility-nightly:
needs:
- build_operator
Expand Down
95 changes: 38 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,78 +22,56 @@ by some organizations to back user interfaces for KubeRay resource management.

* **KubeRay CLI**: KubeRay CLI provides the ability to manage KubeRay resources through command-line interface.

## KubeRay ecosystem

* [AWS Application Load Balancer](docs/guidance/ingress.md)
* [Nginx](docs/guidance/ingress.md)
* [Prometheus and Grafana](docs/guidance/prometheus-grafana.md)
* [Volcano](docs/guidance/volcano-integration.md)
* [MCAD](docs/guidance/kuberay-with-MCAD.md)
* [Kubeflow](docs/guidance/kubeflow-integration.md)

## Blogs

* [A cloud-native, open-source stack for accelerating foundation model innovation](https://research.ibm.com/blog/openshift-foundation-model-stack) IBM (May 9, 2023).
* [AI/ML Models Batch Training at Scale with Open Data Hub](https://cloud.redhat.com/blog/ai/ml-models-batch-training-at-scale-with-open-data-hub) Red Hat (May 15, 2023).

## Documentation

You can view detailed documentation and guides at [https://ray-project.github.io/kuberay/](https://ray-project.github.io/kuberay/).

We also recommend checking out the official Ray guides for deploying on Kubernetes at [https://docs.ray.io/en/latest/cluster/kubernetes/index.html](https://docs.ray.io/en/latest/cluster/kubernetes/index.html).
From September 2023, all user-facing KubeRay documentation will be hosted on the [Ray documentation](https://docs.ray.io/en/latest/cluster/kubernetes/index.html).
The KubeRay repository only contains documentation related to the development and maintenance of KubeRay.

## Quick Start

* Try this [end-to-end example](helm-chart/ray-cluster/README.md)!
* Please choose the version you would like to install. The examples below use the latest stable version `v0.6.0`.
* [RayCluster Quickstart](https://docs.ray.io/en/master/cluster/kubernetes/getting-started/raycluster-quick-start.html)
* [RayJob Quickstart](https://docs.ray.io/en/master/cluster/kubernetes/getting-started/rayjob-quick-start.html)
* [RayService Quickstart](https://docs.ray.io/en/master/cluster/kubernetes/getting-started/rayservice-quick-start.html)

| Version | Stable | Suggested Kubernetes Version |
|----------|:-------:|------------------------------:|
| master | N | v1.19 - v1.25 |
| v0.6.0 | Y | v1.19 - v1.25 |
## Examples

### Use YAML
* [Ray Train XGBoostTrainer on Kubernetes](https://docs.ray.io/en/master/cluster/kubernetes/examples/ml-example.html#kuberay-ml-example) (CPU-only)
* [Train PyTorch ResNet model with GPUs on Kubernetes](https://docs.ray.io/en/master/cluster/kubernetes/examples/gpu-training-example.html#kuberay-gpu-training-example)
* [Serve a MobileNet image classifier on Kubernetes](https://docs.ray.io/en/master/cluster/kubernetes/examples/mobilenet-rayservice.html#kuberay-mobilenet-rayservice-example) (CPU-only)
* [Serve a StableDiffusion text-to-image model on Kubernetes](https://docs.ray.io/en/master/cluster/kubernetes/examples/stable-diffusion-rayservice.html#kuberay-stable-diffusion-rayservice-example)
* [Serve a text summarizer on Kubernetes](https://docs.ray.io/en/master/cluster/kubernetes/examples/text-summarizer-rayservice.html#kuberay-text-summarizer-rayservice-example)
* [RayJob Batch Inference Example](https://docs.ray.io/en/master/cluster/kubernetes/examples/rayjob-batch-inference-example.html#kuberay-batch-inference-example)

Make sure your Kubernetes and Kubectl versions are both within the suggested range.
Once you have connected to a Kubernetes cluster, run the following commands to deploy the KubeRay Operator.
## Kubernetes Ecosystem

```sh
# case 1: kubectl >= v1.22.0
export KUBERAY_VERSION=v0.6.0
kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=${KUBERAY_VERSION}&timeout=90s"
* [Ingress: AWS Application Load Balancer, GKE Ingress, Nginx](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/ingress.html#kuberay-ingress)
* [Using Prometheus and Grafana](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.html#kuberay-prometheus-grafana)
* [Profiling with py-spy](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/pyspy.html#kuberay-pyspy-integration)
* [KubeRay integration with Volcano](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-volcano)
* [Kubeflow: an interactive development solution](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/kubeflow.html#kuberay-kubeflow-integration)

# case 2: kubectl < v1.22.0
# Clone KubeRay repository and checkout to the desired branch e.g. `release-0.6`.
kubectl create -k ray-operator/config/default
```

To deploy both the KubeRay Operator and the optional KubeRay API Server run the following commands.
## Blogs

```sh
# case 1: kubectl >= v1.22.0
export KUBERAY_VERSION=v0.6.0
kubectl create -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources?ref=${KUBERAY_VERSION}&timeout=90s"
kubectl apply -k "github.com/ray-project/kuberay/manifests/base?ref=${KUBERAY_VERSION}&timeout=90s"

# case 2: kubectl < v1.22.0
# Clone KubeRay repository and checkout to the desired branch e.g. `release-0.4`.
kubectl create -k manifests/cluster-scope-resources
kubectl apply -k manifests/base
```
* [A cloud-native, open-source stack for accelerating foundation model innovation](https://research.ibm.com/blog/openshift-foundation-model-stack) IBM (May 9, 2023).
* [AI/ML Models Batch Training at Scale with Open Data Hub](https://cloud.redhat.com/blog/ai/ml-models-batch-training-at-scale-with-open-data-hub) Red Hat (May 15, 2023).

> Observe that we must use `kubectl create` to install cluster-scoped resources. The corresponding `kubectl apply` command will not work. See [KubeRay issue #271](https://github.com/ray-project/kuberay/issues/271).
## Helm Charts

### Use Helm (Helm v3+)
KubeRay Helm charts are hosted on the [ray-project/kuberay-helm](https://github.com/ray-project/kuberay-helm) repository.
Please read [kuberay-operator](helm-chart/kuberay-operator/README.md) to deploy the operator and [ray-cluster](helm-chart/ray-cluster/README.md) to deploy a configurable Ray cluster.
To deploy the optional KubeRay API Server, see [kuberay-apiserver](helm-chart/kuberay-apiserver/README.md).

A Helm chart is a collection of files that describe a related set of Kubernetes resources.
It can help users to deploy the KubeRay Operator and Ray clusters conveniently.
Please read [kuberay-operator](helm-chart/kuberay-operator/README.md) to deploy the operator and [ray-cluster](helm-chart/ray-cluster/README.md) to deploy a configurable Ray cluster. To deploy the optional KubeRay API Server, see [kuberay-apiserver](helm-chart/kuberay-apiserver/README.md).

```sh
# Add the Helm repo
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update

# Confirm the repo exists
helm search repo kuberay --devel

# Install both CRDs and KubeRay operator v0.6.0.
helm install kuberay-operator kuberay/kuberay-operator --version 0.6.0
# Install both CRDs and KubeRay operator v1.0.0.
helm install kuberay-operator kuberay/kuberay-operator --version 1.0.0-rc.1

# Check the KubeRay operator Pod in `default` namespace
kubectl get pods
Expand All @@ -105,10 +83,13 @@ kubectl get pods

Please read our [CONTRIBUTING](CONTRIBUTING.md) guide before making a pull request. Refer to our [DEVELOPMENT](./ray-operator/DEVELOPMENT.md) to build and run tests locally.

### Getting involved
Kuberay has an active community of developers. Here’s how to get involved with the Kuberay community:
## Getting Involved

Join [Ray's Slack workspace](https://docs.google.com/forms/d/e/1FAIpQLSfAcoiLCHOguOm8e7Jnn-JJdZaCxPGjgVCvFijHB5PLaQLeig/viewform), and search the following public channels:

* `#kuberay-questions` (KubeRay users): This channel aims to help KubeRay users with their questions. The messages will be closely monitored by the Ray and KubeRay maintainers.

Join our community: Join [Ray community slack](https://forms.gle/9TSdDYUgxYs8SA9e8) (search for Kuberay channel) or use our [discussion board](https://discuss.ray.io/c/ray-clusters/ray-kubernetes) to ask questions and get answers.
* `#kuberay-discuss` (KubeRay contributors): This channel is for contributors to discuss what to do next with KubeRay (e.g. issues, pull requests, feature requests, design docs, KubeRay ecosystem integrations). All KubeRay maintainers and core contributors are in the channel.

## Security

Expand Down
2 changes: 1 addition & 1 deletion apiserver/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Build the backend service
FROM registry.access.redhat.com/ubi8/go-toolset:1.19.10-10 as builder
FROM registry.access.redhat.com/ubi8/go-toolset:1.19.13 as builder

WORKDIR /workspace
# Copy the Go Modules manifests
Expand Down
Loading

0 comments on commit 33566e7

Please sign in to comment.