Releases: ray-project/kuberay
v1.2.2
Highlights
- (alpha) Ray kubectl plugin
get
,session
,log
,job submit
- (alpha) Kubernetes events: create Kubernetes events for important information about the interactions between KubeRay and the Kubernetes API server
- (alpha) Apache YuniKorn integration
Changelog
- [release] Update Ray image to 2.34.0 (#2303, @kevin85421)
- Revert "[release] Update Ray image to 2.34.0 (#2303)" (#2413, @kevin85421)
- Revert "[release] Update Ray image to 2.34.0 (#2303)" (#2413) (#2415, @kevin85421)
- [Build][kubectl-plugin] Add release script for kubectl plugin (#2407, @MortalHappiness)
- [Feat][kubectl-plugin] Add Long, Example, shell completion for kubectl ray log (#2405, @MortalHappiness)
- Support gang scheduling with Apache YuniKorn (#2396, @yangwwei)
- [Feat][Kubectl-Plugin]Implement kubectl ray job submit (#2394, @chiayi)
- Add 1K, 5K and 10K RayCluster/RayJob scalability test results (#2218, @andrewsykim)
- [Feat][kubectl-plugin] Add dynamic shell completion for kubectl ray session (#2390, @MortalHappiness)
- [Feature][RayJob]: Generate submitter and RayCluster creation/deletion events (#2389, @rueian)
- [RayJob] Add Failure Feedback (log and event) for Failed k8s Creation Task (#2306, @tinaxfwu)
- [Feat][Kubectl-Plugin] Implement kubectl session for RayJob and RayService (#2379, @MortalHappiness)
- [Feat][kubectl-plugin] Add instructions for static shell completion (#2384, @MortalHappiness)
- [Feat][RayJob] UserMode SubmissionMode (#2364, @MortalHappiness)
- [Feature] Add Kubernetes manifest validation in pre-commit. (#2380, @LeoLiao123)
- [Feature][RayCluster]: Generate GCS FT Redis Cleanup Job creation events (#2382, @rueian)
- [Chore][Minor] Add .gitignore to kubectl-plugin (#2383, @MortalHappiness)
- Remove default option for batch scheduler name (#2371, @yangwwei)
- RayCluster Headless Worker Service Should PublishNotReadyAddresses (#2375, @ryanaoleary)
- [CI][GitHub-Actions] Upgrade actions/upload-artifact to v4 (#2373, @MortalHappiness)
- add support for pipeline-parallel-size in vLLM example (#2370, @andrewsykim)
- Add kubectl ray cluster log command (#2296, @chiayi)
- [Chore] Fix lint errors caused by casting int to int32 (#2368, @kevin85421)
- [Feature][kubectl-plugin] Implement kubectl ray session (#2298, @MortalHappiness)
- Use longer exec probe timeouts for Head pods (#2353, @andrewsykim)
- Remove redundant log line that is failing golangci-lint (#2366, @andrewsykim)
- [Chore][Linter] Upgrade golangci-lint to 1.60.3 (#2362, @MortalHappiness)
- Add batch-scheduler option, deprecate enable-batch-scheduler option (#2300, @yangwwei)
- [Feature] Display reconcile failures as events (ServiceAccount) (#2290, @cchen777)
- [Feature][RayCluster]: Deprecate the RayCluster .Status.State field (#2288, @rueian)
- Don't print redundant time unit in the log message (#2335, @tczekajlo)
- [Refactor][sample-yaml-test] Create sampleyaml package and run tests in CI (#2312, @MortalHappiness)
- [Refactor] Fix CreatedWorkerPod for worker Pod deletion event and refactor logs (#2346, @kevin85421)
- raycluster_controller: generate events for failed pod creation (#2286, @MadhavJivrajani)
- [Refactor][kubectl-plugin] Rename filenames and variables based on kubectl repo (#2295, @MortalHappiness)
v1.2.1 release
Compared to KubeRay v1.2.0, KubeRay v1.2.1 includes an additional commit (#2243). This commit fixes the issue where a RayService created by a KubeRay version older than v1.2.0 does not support zero-downtime upgrades after upgrading to KubeRay v1.2.0.
- [RayService] Use original ClusterIP for new head service (#2343, @kevin85421)
v1.2.0 release
Highlights
- RayCluster CRD status observability improvement: design doc
- Support retry in RayJob: #2192
- Coding style improvement
RayCluster
- [RayCluster][Fix] evicted head-pod can be recreated or restarted (#2217, @JasonChen86899)
- [Test][RayCluster] Add tests for RestartPolicyOnFailure for eviction (#2302, @MortalHappiness)
- kuberay autoscaler pod use same command and args as ray head container (#2268, @cswangzheng)
- Updated default timeout seconds for probes (#2265, @HarshAgarwal11)
- Buildkite autoscaler e2e (#2199, @rueian)
- [Test][Autoscaler][2/n] Add Ray Autoscaler e2e tests for GPU workers (#2181, @rueian)
- [Test][Autoscaler][1/n] Add Ray Autoscaler e2e tests (#2168, @kevin85421)
- [Bug] Fix RayCluster with an overridden app.kubernetes.io/name (#2147) (#2166, @rueian)
- [Feat][RayCluster] Make the Head service headless (#2117, @rueian)
- [Refactor][RayCluster] Make ray.io/group=headgroup be constant (#1970, @rueian)
- [Feature][autoscaler v2] Set RAY_NODE_TYPE_NAME when starting ray node (#1973, @kevin85421)
- feat: add
RayCluster.status.readyWorkerReplicas
(#1930, @davidxia) - [Chore][Samples] Rename ray-cluster.mini.yaml and add workerGroupSpecs (#2100, @MortalHappiness)
- [Chore] Delete redundant pod existance checking (#2113, @MortalHappiness)
- [Autoscaler V2] Polish Autoscaler V2 YAML (#2064, @kevin85421)
- [Refactor] Use RayClusterHeadPodsAssociationOptions to replace MatchingLabels (#2056, @evalaiyc98)
- [Sample][autoscaler v2] Add sample yaml for autosclaer v2 (#1974, @rickyyx)
- Allow configuration of restartPolicy (#2197, @c0dearm)
- [Chore][Log] Delete error loggings right before returned errors (#2103, @MortalHappiness)
- [Refactor] Follow-up for PR 1930 (#2124, @MortalHappiness)
- [Test] Move StateTransitionTimes envtest to a better place (#2111, @kevin85421)
- support using proxy subresources when connecting to Ray head node (#1980, @andrewsykim)
- [Bug] All worker Pods are deleted if using KubeRay v1.0.0 CRD with KubeRay operator v1.1.0 image (#2087, @kevin85421)
- [Bug] KubeRay operator failed to watch endpoint (#2080, @kevin85421)
- [Refactor] Remove
cleanupInvalidVolumeMounts
(#2104, @kevin85421) - support using proxy subresources when connecting to Ray head node (#1980, @andrewsykim)
- [Chore] Run operator outside the cluster (#2090, @MortalHappiness)
- [Feat] Deprecate ForcedClusterUpgrade (#2075, @MortalHappiness)
- [Bug] Ray operator crashes when specifying RayCluster with resources.limits but no resources.requests (#2077, @kevin85421)
RayCluster CRD status improvement
- RayClusterProvisioned status should be set while cluster is being provisioned for the first time (#2304, @andrewsykim)
- Add RayClusterProvisioned Condition Type (#2301, @Yicheng-Lu-llll)
- [Test][RayCluster] Add envtests for RayCluster conditions (#2283, @MortalHappiness)
- [Fix][RayCluster] Make the RayClusterReplicaFailureReason to capture the correct reason (#2282, @rueian)
- Add RayClusterReady Condition Type (#2271, @Yicheng-Lu-llll)
- [Feature][RayCluster]: Implement the HeadReady condition (#2261, @cchen777)
- [Feature] REP 54: Add PodName to the HeadInfo (#2266, @rueian)
- [Feat][RayCluster] Use a new RayClusterReplicaFailure condition to reflect the result of reconcilePods (#2259, @rueian)
- Don’t assign the rayv1.Failed to the State field (#2258, @Yicheng-Lu-llll)
- [Refactor][RayCluster] Unify status update to single place (#2249, @MortalHappiness)
- [Feat][RayCluster] Introduce the RayClusterStatus.Conditions field (#2214, @rueian)
- [Test][Autoscaling] Add custom resource test (#2193, @MortalHappiness)
- feat: record last state transition times (#2053, @davidxia)
- [RayCluster] Add serviceName to status.headInfo (#2089, @andrewsykim)
- [RayCluster][Status][1/n] Remove ClusterState Unhealthy (#2068, @kevin85421)
Coding style improvement
- [Style] Fix golangci-lint rule: govet (#2144, @MortalHappiness)
- [Chore] Fix golangci-lint rule: gosec (#2163, @MortalHappiness)
- [Style] Fix golangci-lint rule: nolintlint (#2196, @MortalHappiness)
- [Style] Fix golangci-lint rule: unparam (#2195, @MortalHappiness)
- [Fix][CI] Fix revive error (#2183, @MortalHappiness)
- [Style] Fix golangci-lint rule: revive (#2167, @MortalHappiness)
- [Style] Fix golangci-lint rule: ginkgolinter (#2164, @MortalHappiness)
- [Style] Fix golangci-lint rule: errorlint (#2141, @MortalHappiness)
- [Chore] Use new golangci-lint rules only for ray-operator (#2152, @MortalHappiness)
- [Docs][Development] Delete linting docs (#2145, @MortalHappiness)
- [Style] Fix golangci-lint rule: unconvert (#2143, @MortalHappiness)
- [Style] Fix golangci-lint rule: noctx (#2142, @MortalHappiness)
- [Fix][precommit] Fix pre-commit golangci-lint always succeed (#2140, @MortalHappiness)
- [N/N][Chore] Add golangci-lint rules (#2128, @MortalHappiness)
- [Chore] Turn off no-commit-to-branch rule (#2139, @MortalHappiness)
- [5/N][Refactor] Run golangci-lint for all files (only autofix rules) (#2133, @MortalHappiness)
- [4/N][Chore] Turn off golangci-lint rules except ray-operator (#2138, @MortalHappiness)
- [3/N][CI] Replace lint CI with pre-commit (#2129, @MortalHappiness)
- [2/N][Refactor] Run pre-commit for all files (without golangci-lint) (#2130, @MortalHappiness)
- [1/N][Chore] Add pre-commit hooks (#2127, @MortalHappiness)
RayJob
- [RayJob] allow create verb for services/proxy, which is required for HTTPMode (#2321, @andrewsykim)
- [Fix][Sample-Yaml] Increase ray head CPU resource for pytorch minst (#2330, @MortalHappiness)
- Support Apache YuniKorn as one batch scheduler option (#2184, @yangwwei)
- [RayJob] add RayJob pass Deadline e2e-test with retry (#2241, @karta1502545)
- add feature gate mechanism to ray-operator (#2219, @andrewsykim)
- [RayJob] add Failing RayJob in HTTPMode e2e test for rayjob with retry (#2242, @tinaxfwu)
- [Feat][RayJob] Delete RayJob CR after job termination (#2225, @MortalHappiness)
- reconcile concurrency flag should apply for RayJob and RayService controllers (#2228, @andrewsykim)
- [RayJob] add Failing submitter K8s Job e2e ...
v1.1.1 release
Compared to KubeRay v1.1.0, KubeRay v1.1.1 includes four cherry-picked commits.
- [Bug] Ray operator crashes when specifying RayCluster with resources.limits but no resources.requests (#2077, @kevin85421)
- [CI] Pin kustomize to v5.3.0 (#2067, @kevin85421)
- [Bug] All worker Pods are deleted if using KubeRay v1.0.0 CRD with KubeRay operator v1.1.0 image (#2087, @kevin85421)
- [Hotfix][CI] Pin setup-envtest dep (#2038, @kevin85421)
v1.1.0 release
Highlights
-
RayJob improvements
- Gang / Priority scheduling with Kueue:
- ActiveDeadlineSeconds (new field): A feature to control the lifecycle of a RayJob. See this doc and #1933 for more details.
- submissionMode (new field): Users can specify “K8sJobMode” or “HTTPMode”. The default value is “K8sJobMode”. In HTTPMode, the submitter K8s Job will not be created. Instead, KubeRay sends a HTTP request to the Ray head Pod to create a Ray job. See this doc and #1893 for more details.
- Fix a lot of stability issues.
-
Structured logging
- In KubeRay v1.1.0, we have changed the KubeRay logs to JSON format, and each log message includes context information such as the custom resource’s name and reconcileID. Hence, users can filter out logs associated with a RayCluster, RayJob, or RayService CR by its name.
-
RayService improvements
- Refactor health check mechanism to improve the stability.
- Deprecate the
deploymentUnhealthySecondThreshold
andserviceUnhealthySecondThreshold
to avoid unintentional preparation of new RayCluster custom resource.
-
TPU multi-host PodSlice support
- The KubeRay team is actively working with the Google GKE and TPU teams on integration. The required changes in KubeRay have already been completed. The GKE team will complete some tasks on their side this week or next. Then, users should be able to use multi-host TPU PodSlice with a static RayCluster (without autoscaling).
-
Stop publishing images on DockerHub; instead, we will only publish on Quay.
- https://quay.io/repository/kuberay/operator?tab=tags
- Users should use docker pull
quay.io/kuberay/operator:v1.1.0
instead of docker pullkuberay/operator:v1.1.0
.
RayJob
RayJob state machine refactor
- [RayJob][Status][1/n] Redefine the definition of JobDeploymentStatusComplete (#1719, @kevin85421)
- [RayJob][Status][2/n] Redefine
ready
for RayCluster to avoid using HTTP requests to check dashboard status (#1733, @kevin85421) - [RayJob][Status][3/n] Define JobDeploymentStatusInitializing (#1737, @kevin85421)
- [RayJob][Status][4/n] Remove some JobDeploymentStatus and updateState function calls (#1743, @kevin85421)
- [RayJob][Status][5/n] Refactor getOrCreateK8sJob (#1750, @kevin85421)
- [RayJob][Status][6/n] Redefine JobDeploymentStatusComplete and clean up K8s Job after TTL (#1762, @kevin85421)
- [RayJob][Status][7/n] Define JobDeploymentStatusNew explicitly (#1772, @kevin85421)
- [RayJob][Status][8/n] Only a RayJob with the status Running can transition to Complete at this moment (#1774, @kevin85421)
- [RayJob][Status][9/n] RayJob should not pass any changes to RayCluster (#1776, @kevin85421)
- [RayJob][10/n] Add finalizer to the RayJob when the RayJob status is JobDeploymentStatusNew (#1780, @kevin85421)
- [RayJob][Status][11/n] Refactor the suspend operation (#1782, @kevin85421)
- [RayJob][Status][12/n] Resume suspended RayJob (#1783, @kevin85421)
- [RayJob][Status][13/n] Make suspend operation atomic by introducing the new status
Suspending
(#1798, @kevin85421) - [RayJob][Status][14/n] Decouple the Initializing status and Running status (#1801, @kevin85421)
- [RayJob][Status][15/n] Unify the codepath for the status transition to
Suspended
(#1805, @kevin85421) - [RayJob][Status][16/n] Refactor
Running
status (#1807, @kevin85421) - [RayJob][Status][17/n] Unify the codepath for status updates (#1814, @kevin85421)
- [RayJob][Status][18/n] Control the entire lifecycle of the Kubernetes submitter Job using KubeRay (#1831, @kevin85421)
- [RayJob][Status][19/n] Transition to
Complete
if the K8s Job fails (#1833, @kevin85421)
Others
- [Refactor] Remove global utils.GetRayXXXClientFuncs (#1727, @rueian)
- [Feature] Warn Users When Updating the RayClusterSpec in RayJob CR (#1778, @Yicheng-Lu-llll)
- Add apply configurations to generated client (#1818, @astefanutti)
- RayJob: inject RAY_DASHBOARD_ADDRESS envariable variable for user provided submiter templates (#1852, @andrewsykim)
- [Bug] Submitter K8s Job fails even though the RayJob has a JobDeploymentStatus
Complete
and a JobStatusSUCCEEDED
(#1919, @kevin85421) - add toleration for GPUs in sample pytorch RayJob (#1914, @andrewsykim)
- Add a sample RayJob to fine-tune a PyTorch lightning text classifier with Ray Data (#1891, @andrewsykim)
- rayjob controller: refactor environment variable check in unit tests (#1870, @andrewsykim)
- RayJob: don't delete submitter job when ShutdownAfterJobFinishes=true (#1881, @andrewsykim)
- rayjob controller: update EndTime to always be the time when the job deployment transitions to Complete status (#1872, @andrewsykim)
- chore: remove ConfigMap from ray-job.kueue-toy-sample.yaml (#1976, @kevin85421)
- [Kueue] Add a sample YAML for Kueue toy sample (#1956, @kevin85421)
- [RayJob] Support ActiveDeadlineSeconds (#1933, @kevin85421)
- [Feature][RayJob] Support light-weight job submission (#1893, @kevin85421)
- [RayJob] Add JobDeploymentStatusFailed Status and Reason Field to Enhance Observability for Flyte/RayJob Integration (#1942, @Yicheng-Lu-llll)
- [RayJob] Refactor Rayjob E2E Tests to Use Server-Side Apply (#1927, @Yicheng-Lu-llll)
- [RayJob] Rewrite RayJob envtest (#1916, @kevin85421)
- [Chore][RayJob] Remove the TODO of verifying the schema of RayJobInfo because it is already correct (#1911, @rueian)
- [RayJob] Set missing CPU limit (#1899, @kevin85421)
- [RayJob] Set the timeout of the HTTP client from 2 mins to 2 seconds (#1910, @kevin85421)
- [Feature][RayJob] Support light-weight job submission with entrypoint_num_cpus, entrypoint_num_gpus and entrypoint_resources (#1904, @rueian)
- [RayJob] Improve dashboard client log (#1903, @kevin85421)
- [RayJob] Validate whether runtimeEnvYAML is a valid YAML string (#1898, @kevin85421)
- [RayJob] Add additional print columns for RayJob (#1895, @andrewsykim)
- [Test][RayJob] Transition to
Complete
if the JobStatus is STOPPED (#1871, @kevin85421) - [RayJob] Inject RAY_SUBMISSION_ID env variable for user provided submitter template (#1868, @kevin85421)
- [RayJob] Transition to
Complete
if the JobStatus is STOPPED (#1855, @kevin85421) - [RayJob][Kueue] Move limitation check to validateRayJobSpec (#1854, @kevin85421)
- [RayJob] Validate RayJob spec (#1813, @kevin85421)
- [Test][RayJob] Kueue happy-path scenario (#1809, @kevin85421)
- [RayJob] Delete the Kubernetes Job and its Pods immediately when suspending (#1791, @rueian)
- [Feature][RayJob] Remove the deprecated RuntimeEnv from CRD. Use RuntimeEnvYAML instead. (#1792, @rueian)
- [Bug][RayJob] Avoid nil pointer dereference ([#1756](https://github.c...
v1.0.0 release
KubeRay is officially in General Availability!
- Bump the CRD version from v1alpha1 to v1.
- Relocate almost all documentation to the Ray website.
- Improve RayJob UX.
- Improve GCS fault tolerance.
GCS fault tolerance
- [GCS FT] Improve GCS FT cleanup UX (#1592, @kevin85421)
- [Bug][RayCluster] Fix RAY_REDIS_ADDRESS parsing with redis scheme and… (#1556, @rueian)
- [Bug] RayService with GCS FT HA issue (#1551, @kevin85421)
- [Test][GCS FT] End-to-end test for cleanup_redis_storage (#1422)(#1459) (#1466, @rueian)
- [Feature][GCS FT] Clean up Redis once a GCS FT-Enabled RayCluster is deleted (#1412, @kevin85421)
- Update GCS fault tolerance YAML (#1404, @kevin85421)
- [GCS FT] Consider the case of sidecar containers (#1386, @kevin85421)
- [GCS FT] Give readiness / liveness probes good default values (#1364, @kevin85421)
- [GCS FT][Refactor] Redefine the behavior for deleting Pods and stop listening to Kubernetes events (#1341, @kevin85421)
CRD versioning
- [CRD] Inject CRD version to the Autoscaler sidecar container (#1496, @kevin85421)
- [CRD][2/n] Update from CRD v1alpha1 to v1 (#1482, @kevin85421)
- [CRD][1/n] Create v1 CRDs (#1481, @kevin85421)
- [CRD] Set maxDescLen to 0 (#1449, @kevin85421)
RayService
- [Hotfix][Bug] Avoid unnecessary zero-downtime upgrade (#1581, @kevin85421)
- [Feature] Add an example for RayService high availability (#1566, @kevin85421)
- [Feature] Add a flag to make zero downtime upgrades optional (#1564, @kevin85421)
- [Bug][RayService] KubeRay does not recreate Serve applications if a head Pod without GCS FT recovers from a failure. (#1420, @kevin85421)
- [Bug] Fix the filename of text summarizer YAML (#1415, @kevin85421)
- [serve] Change text ml yaml to use french in user config (#1403, @zcin)
- [services] Add text ml rayservice yaml (#1402, @zcin)
- [Bug] Fix flakiness of RayService e2e tests (#1385, @kevin85421)
- Add RayService sample test (#1377, @Darren221)
- [RayService] Revisit the conditions under which a RayService is considered unhealthy and the default threshold (#1293, @kevin85421)
- [RayService][Observability] Add more loggings about networking issues (#1282, @kevin85421)
RayJob
- [Feature] Improve observability for flaky RayJob test (#1587, @kevin85421)
- [Bug][RayJob] Fix FailedToGetJobStatus by allowing transition to Running (#1583, @architkulkarni)
- [RayJob] Fix RayJob status reconciliation (#1539, @astefanutti)
- [RayJob]: Always use target RayCluster image as default RayJob submitter image (#1548, @astefanutti)
- [RayJob] Add default CPU and memory for job submitter pod (#1319, @architkulkarni)
- [Bug][RayJob] Check dashboard readiness before creating job pod (#1381) (#1429, @rueian)
- [Feature][RayJob] Use RayContainerIndex instead of 0 (#1397) (#1427, @rueian)
- [RayJob] Enable job log streaming by setting
PYTHONUNBUFFERED
in job container (#1375, @architkulkarni) - Add field to expose entrypoint num cpus in rayjob (#1359, @shubhscoder)
- [RayJob] Add runtime env YAML field (#1338, @architkulkarni)
- [Bug][RayJob] RayJob with custom head service name (#1332, @kevin85421)
- [RayJob] Add e2e sample yaml test for shutdownAfterJobFinishes (#1269, @architkulkarni)
RayCluster
- [Enhancement] Remove unused variables in constant.go (#1474, @evalaiyc98)
- [Enhancement] GPU RayCluster doesn't work on GKE Autopilot (#1470, @kevin85421)
- [Refactor] Parameterize TestGetAndCheckServeStatus (#1450, @evalaiyc98)
- [Feature] Make replicas optional for WorkerGroupSpec (#1443, @kevin85421)
- use raycluster app's name as podgroup name key word (#1446, @lowang-bh)
- [Refactor] Make port name variables consistent and meaningful (#1389, @evalaiyc98)
- [Feature] Use image of Ray head container as the default Ray Autoscaler container (#1401, @kevin85421)
- Update Autoscaler YAML for the Autoscaler tutorial (#1400, @kevin85421)
- [Feature] Ray container must be the first application container (#1379, @kevin85421)
- [release blocker][Feature] Only Autoscaler can make decisions to delete Pods (#1253, @kevin85421)
- [release blocker][Autoscaler] Randomly delete Pods when scaling down the cluster (#1251, @kevin85421)
Helm charts
- Remove miniReplicas in raycluster-cluster.yaml (#1473, @evalaiyc98)
- Helm chart ray-cluster template reference fix (#1469, @chrisxstyles)
- fix: Issue #1391 - Custom labels not being pulled in (#1398, @rxraghu)
- Remove unnecessary kustomize in make helm (#1370, @shubhscoder)
- [Feature] Allow RayCluster Helm chart to specify different images for different worker groups (#1352, @Darren221)
- Allow manually creating init containers in Kuberay helm charts (#1287, @richardsliu)
KubeRay API Server
- Added Python API server client (#1561, @blublinsky)
- updating url use v1 (#1577, @blublinsky)
- Fixed processing of job submitter (#1562, @blublinsky)
- extended job APIs (#1537, @blublinsky)
- fixed volumes test in cluster test (#1498, @blublinsky)
- Add documentation for API Server monitoring (#1479, @blublinsky)
- created HA example for API server (#1461, @blublinsky)
- Numerous fixes to the API server to make RayJob APIs working (#1447, @blublinsky)
- Updated API server documentation (#1435, @z103cb)
- servev2 support for API server (#1419, @blublinsky)
- replacement for #1312 (#1409, @blublinsky)
- Updates to the apiserver swagger-ui (#1410, @z103cb)
- implemented liveness/readyness probe for the API server (#1369, @blublinsky)
- Operator support for openShift (#1371, @blublinsky)
- Removed use of the of BUILD_FLAGS in apiserver makefile (#1336, @z103cb)
- Api server makefile (#1301, @z103cb)
Documentation
- [Doc] Update release docs (#1621, @kevin85421)
- [Doc] Fix release doc format (#1578, @kevin85421)
- Update kuberay mcad integration doc (#1373, @tedhtchang)
- [Release][Doc] Add instructions to release Go modules. (#1546, @kevin85421)
- [Post v1.0.0-rc.1] Reenable sample YAML tests for latest release and update some docs (#1544, @kevin85421)
- Update operator development instruction ([#1458](https://g...
v0.6.0 release
Highlights
-
RayService
- RayService starts to support Ray Serve multi-app API (#1136, #1156)
- RayService stability improvements (#1231, #1207, #1173)
- RayService observability (#1230)
- RayService examples
- [RayService] Stable Diffusion example (#1181, @kevin85421)
- MobileNet example (#1175, @kevin85421)
- RayService troubleshooting handbook (#1221)
-
RayJob refactoring (#1177)
RayService
- [RayService][Observability] Add more logging for RayService troubleshooting (#1230, @kevin85421)
- [Bug] Long image pull time will trigger blue-green upgrade after the head is ready (#1231, @kevin85421)
- [RayService] Stable Diffusion example (#1181, @kevin85421)
- [RayService] Update docs to use multi-app (#1179, @zcin)
- [RayService] Change runtime env for e2e autoscaling test (#1178, @zcin)
- [RayService] Add e2e tests (#1167, @zcin)
- [RayService][docs] Improve explanation for config file and in-place updates (#1229, @zcin)
- [RayService][Doc] RayService troubleshooting handbook (#1221, @kevin85421)
- [Doc] Improve RayService doc (#1235, @kevin85421)
- [Doc] Improve FAQ page and RayService troubleshooting guide (#1225, @kevin85421)
- [RayService] Add RayService alb ingress CR (#1169, @sihanwang41)
- [RayService] Add support for multi-app config in yaml-string format (#1156, @zcin)
- [rayservice] Add support for getting multi-app status (#1136, @zcin)
- [Refactor] Remove Dashboard Agent service (#1207, @kevin85421)
- [Bug] KubeRay operator fails to get serve deployment status due to 500 Internal Server Error (#1173, @kevin85421)
- MobileNet example (#1175, @kevin85421)
- [Bug] fix RayActorOptionSpec.items.spec.serveConfig.deployments.rayActorOptions.memory int32 data type (#1220, @kevin85421)
RayJob
- [RayJob] Submit job using K8s job instead of checking Status and using DashboardHTTPClient (#1177, @architkulkarni)
- [Doc] [RayJob] Add documentation for submitterPodTemplate (#1228, @architkulkarni)
Autoscaler
- [release blocker][Feature] Only Autoscaler can make decisions to delete Pods (#1253, @kevin85421)
- [release blocker][Autoscaler] Randomly delete Pods when scaling down the cluster (#1251, @kevin85421)
Helm
- [Helm][RBAC] Introduce the option crNamespacedRbacEnable to enable or disable the creation of Role/RoleBinding for RayCluster preparation (#1162, @kevin85421)
- [Bug] Allow zero replica for workers for Helm (#968, @ducviet00)
- [Bug] KubeRay tries to create ClusterRoleBinding when singleNamespaceInstall and rbacEnable are set to true (#1190, @kevin85421)
KubeRay API Server
- Add support for openshift routes (#1183, @blublinsky)
- Adding API server support for service account (#1148, @blublinsky)
Documentation
- [release v0.6.0] Update tags and versions (#1270, @kevin85421)
- [release v0.6.0-rc.1] Update tags and versions (#1264, @kevin85421)
- [release v0.6.0-rc.0] Update tags and versions (#1237, @kevin85421)
- [Doc] Develop Ray Serve Python script on KubeRay (#1250, @kevin85421)
- [Doc] Fix the order of comments in sample Job YAML file (#1242, @architkulkarni)
- [Doc] Upload a screenshot for the Serve page in Ray dashboard (#1236, @kevin85421)
- [Doc] GKE GPU cluster setup (#1223, @kevin85421)
- [Doc][Website] Add complete document link (#1224, @yuxiaoba)
- Add FAQ page (#1150, @Yicheng-Lu-llll)
- [Doc] Add gofumpt lint instructions (#1180, @architkulkarni)
- [Doc] Add
helm update
command to chart validation step in release process (#1165, @architkulkarni) - [Doc] Add git fetch --tags command to release instructions (#1164, @architkulkarni)
- Add KubeRay related blogs (#1147, @tedhtchang)
- [2.5.0 Release] Change version numbers 2.4.0 -> 2.5.0 (#1151, @ArturNiederfahrenhorst)
- [Sample YAML] Bump ray version in pod security YAML to 2.4.0 (#1160, @architkulkarni)
- Add instruction to skip unit tests in DEVELOPMENT.md (#1171, @architkulkarni)
- Fix typo (#1241, @mmourafiq)
- Fix typo (#1232, @mmourafiq)
CI
- [CI] Add
kind
-in-Docker test to Buildkite CI (#1243, @architkulkarni) - [CI] Remove unnecessary release.yaml workflow (#1168, @architkulkarni)
Others
- Pin operator version in single namespace installation(#1193) (#1210, @wjzhou)
- RayCluster updates status frequently (#1211, @kevin85421)
- Improve the observability of the init container (#1149, @Yicheng-Lu-llll)
- [Ray Observability] Disk usage in Dashboard (#1152, @kevin85421)
v0.5.2 release
Changelog for v0.5.2
Highlights
The KubeRay 0.5.2 patch release includes the following improvements.
- Allow specifying the entire headService and serveService YAML spec. Previously, only certain special fields such as
labels
andannotations
were exposed to the user.- Expose entire head pod Service to the user (#1040, @architkulkarni)
- Exposing Serve Service (#1117, @kodwanis)
- RayService stability improvements
- RayService object’s Status is being updated due to frequent reconciliation (#1065, @kevin85421)
- [RayService] Submit requests to the Dashboard after the head Pod is running and ready (#1074, @kevin85421)
- Fix in HeadPod Service Generation logic which was causing frequent reconciliation (#1056, @msumitjain)
- Allow watching multiple namespaces
- [Feature] Watch CR in multiple namespaces with namespaced RBAC resources (#1106, @kevin85421)
- Autoscaler stability improvements
- [Bug] RayService restarts repeatedly with Autoscaler (#1037, @kevin85421)
- [Bug] autoscaler not working properly in rayjob (#1064, @Yicheng-Lu-llll)
- [Bug][Autoscaler] Operator does not remove workers (#1139, @kevin85421)
Contributors
We'd like to thank the following contributors for their contributions to this release:
@ByronHsu, @Yicheng-Lu-llll, @anishasthana, @architkulkarni, @blublinsky, @chrisxstyles, @dirtyValera, @ecurtin, @jasoonn, @jjyao, @kevin85421, @kodwanis, @msumitjain, @oginskis, @psschwei, @scarlet25151, @sihanwang41, @tedhtchang, @varungup90, @xubo245
Features
- Add a flag to enable/disable worker init container injection (#1069, @ByronHsu)
- Add a warning to discourage users from launching a KubeRay-incompatible autoscaler. (#1102, @kevin85421)
- Add consistency check for deepcopy generated files (#1127, @varungup90)
- Add kubernetes dependency in python client library (#998, @jasoonn)
- Add support for pvcs to apiserver (#1118, @psschwei)
- Add support for tolerations, env, annotations and labels (#1070, @blublinsky)
- Align Init Container's ImagePullPolicy with Ray Container's ImagePullPolicy (#1080, @Yicheng-Lu-llll)
- Connect Ray client with TLS using Nginx Ingress on Kind cluster (#729) (#1051, @tedhtchang)
- Expose entire head pod Service to the user (#1040, @architkulkarni)
- Exposing Serve Service (#1117, @kodwanis)
- [Test] Add e2e test for sample RayJob yaml on kind (#935, @architkulkarni)
- Parametrize ray-operator makefile (#1121, @anishasthana)
- RayService object's Status is being updated due to frequent reconciliation (#1065, @kevin85421)
- [Feature] Support suspend in RayJob (#926, @oginskis)
- [Feature] Watch CR in multiple namespaces with namespaced RBAC resources (#1106, @kevin85421)
- [RayService] Submit requests to the Dashboard after the head Pod is running and ready (#1074, @kevin85421)
- feat: Rename instances of rayiov1alpha1 to rayv1alpha1 (#1112, @anishasthana)
- ray-operator: Reuse contexts across ray operator reconcilers (#1126, @anishasthana)
Fixes
- Fix CI (#1145, @kevin85421)
- Fix config frequent update (#1014, @sihanwang41)
- Fix for Sample YAML Config Test - 2.4.0 Failure due to 'suspend' Field (#1096, @Yicheng-Lu-llll)
- Fix in HeadPod Service Generation logic which was causing frequent reconciliation (#1056, @msumitjain)
- [Bug] Autoscaler doesn't support TLS (#1119, @chrisxstyles)
- [Bug] Enable ResourceQuota by adding Resources for the health-check init container (#1043, @kevin85421)
- [Bug] Fix null map handling in
BuildServiceForHeadPod
function (#1095, @architkulkarni) - [Bug] RayService restarts repeatedly with Autoscaler (#1037, @kevin85421)
- [Bug] Service (Serve) changing port from 8000 to 9000 doesn't work (#1081, @kevin85421)
- [Bug] autoscaler not working properly in rayjob (#1064, @Yicheng-Lu-llll)
- [Bug] compatibility test for the nightly Ray image fails (#1055, @kevin85421)
- [Bug] rayStartParams is required at this moment. (#1031, @kevin85421)
- [Bug][Autoscaler] Operator does not remove workers (#1139, @kevin85421)
- [Bug][Doc] fix the link error of operator document (#1046, @xubo245)
- [Bug][GCS FT] Worker pods crash unexpectedly when gcs_server on head pod is killed (#1036, @kevin85421)
- [Bug][breaking change] Unauthorized 401 error on fetching Ray Custom Resources from K8s API server (#1128, @kevin85421)
- [Bug][k8s compatibility] k8s v1.20.7 ClusterIP svc do not updated under RayService (#1110, @kevin85421)
- [Helm][ray-cluster] Fix parsing envFrom field in additionalWorkerGroups (#1039, @dirtyValera)
Documentation
- [Doc] Copyedit dev guide (#1012, @architkulkarni)
- [Doc] Update nav to include missing files and reorganize nav (#1011, @architkulkarni)
- [Doc] Update version from 0.4.0 to 0.5.0 on remaining kuberay docs files (#1018, @architkulkarni)
- [Doc][Website] Update KubeRay introduction and fix layout issues (#1042, @kevin85421)
- [Docs][Website] One word typo fix in docs and README (#1068, @ecurtin)
- Add a document to outline the default settings for
rayStartParams
in Kuberay (#1057, @Yicheng-Lu-llll) - Example Pod to connect Ray client to remote a Ray cluster with TLS enabled (#994, @tedhtchang)
- [Post release v0.5.0] Update CHANGELOG.md (#1026, @kevin85421)
- [Post release v0.5.0] Update release doc (#1028, @kevin85421)
- [Post Ray 2.4 Release] Update Ray versions to Ray 2.4.0 (#1049, @jjyao)
- [Post release v0.5.0] Remove block from rayStartParams (#1015, @kevin85421)
- [Post release v0.5.0] Remove block from rayStartParams for python client and KubeRay operator tests (#1050, @Yicheng-Lu-llll)
- [Post release v0.5.0] Remove serviceType (#1013, @kevin85421)
- [Post v0.5.0] Remove init containers from YAML files (#1010, @kevin85421)
- [Sample YAML] Bump ray version in pod security YAML to 2.4.0 (#1160) (#1161, @architkulkarni)
- Kuberay 0.5.0 docs validation update docs for GCS FT (#1004, @scarlet25151)
- Release v0.5.0 doc validation (#997, @kevin85421)
- Release v0.5.0 doc validation part 2 (#999, @architkulkarni)
- Release v0.5.0 python client library validation (#1006, @jasoonn)
- [release v0.5.2] Update tags and versions to 0.5.2 (#1159, @architkulkarni)
v0.5.0 release
Highlights
The KubeRay 0.5.0 release includes the following improvements.
- Interact with KubeRay via a Python client
- Integrate KubeRay with Kubeflow to provide an interactive development environment (link).
- Integrate KubeRay with Ray TLS authentication
- Improve the user experience for KubeRay on AWS EKS (link)
- Fix some Kubernetes networking issues
- Fix some stability bugs in RayJob and RayService
Contributors
The following individuals contributed to KubeRay 0.5.0. This list is alphabetical and incomplete.
@akanso @alex-treebeard @architkulkarni @cadedaniel @cskornel-doordash @davidxia @DmitriGekhtman @ducviet00 @gvspraveen @harryge00 @jasoonn @Jeffwan @kevin85421 @psschwei @scarlet25151 @sihanwang41 @wilsonwang371 @Yicheng-Lu-llll
Python client (alpha)(New!)
Kubeflow (New!)
- [Feature][Doc] Kubeflow integration (#937, @kevin85421)
- [Feature] Ray restricted podsecuritystandards for enterprise security and Kubeflow integration (#750, @kevin85421)
TLS authentication (New!)
- [Feature] TLS authentication (#989, @kevin85421)
AWS EKS (New!)
- [Feature][Doc] Access S3 bucket from Pods in EKS (#958, @kevin85421)
Kubernetes networking (New!)
- Read cluster domain from resolv.conf or env (#951, @harryge00)
- [Feature] Replace service name with Fully Qualified Domain Name (#938, @kevin85421)
- [Feature] Add default init container in workers to wait for GCS to be ready (#973, @kevin85421)
Observability
- Fix issue with head pod not monitered by Prometheus under certain condition (#963, @Yicheng-Lu-llll)
- [Feature] Improve and fix Prometheus & Grafana integrations (#895, @kevin85421)
- Add example and tutorial to explain how to create custom metrics for Prometheus (#914, @Yicheng-Lu-llll)
- feat: enrich
kubectl get
output (#878, @davidxia)
RayCluster
- Fix issue with operator OOM restart (#946, @wilsonwang371)
- [Feature][Hotfix] Add observedGeneration to the status of CRDs (#979, @kevin85421)
- Customize the Prometheus export port (#954, @Yicheng-Lu-llll)
- [Feature] The default ImagePullPolicy should be IfNotPresent (#947, @kevin85421)
- Inject the --block option to ray start command automatically (#932, @Yicheng-Lu-llll)
- Inject cluster name as an environment variable into head and worker pods (#934, @Yicheng-Lu-llll)
- Ensure container ports without names are also included in the head node service (#891, @Yicheng-Lu-llll)
- fix:
.status.availableWorkerReplicas
(#887, @davidxia) - fix: only filter RayCluster events for reconciliation (#882, @davidxia)
- refactor: remove redundant import in
raycluster_controller.go
(#884, @davidxia) - refactor: use equivalent, shorter
Builder.Owns()
method (#881, @davidxia) - [RayCluster controller] [Bug] Unconditionally reconcile RayCluster every 60s instead of only upon change (#850, @architkulkarni)
- [Feature] Make head serviceType optional (#851, @kevin85421)
- [RayCluster controller] Add headServiceAnnotations field to RayCluster CR (#841, @cskornel-doordash)
RayJob (alpha)
- [Hotfix][release blocker][RayJob] HTTP client from submitting jobs before dashboard initialization completes (#1000, @kevin85421)
- [RayJob] Propagate error traceback string when GetJobInfo doesn't return valid JSON (#943, @architkulkarni)
- [RayJob][Doc] Fix RayJob sample config. (#807, @DmitriGekhtman)
RayService (alpha)
- [RayService] Skip update events without change (#811, @sihanwang41)
Helm
- Add rayVersion in the RayCluster chart (#975, @Yicheng-Lu-llll)
- [Feature] Support environment variables for KubeRay operator chart (#978, @kevin85421)
- [Feature] Add service account section in helm chart (#969, @ducviet00)
- Update apiserver chart location in readme (#896, @psschwei)
- add sidecar container option (#920, @akihikokuroda)
- match selector of service to pod labels (#918, @akihikokuroda)
- [Feature] Nodeselector/Affinity/Tolerations value to kuberay-apiserver chart (#879, @alex-treebeard)
- [Feature] Enable namespaced installs via helm chart (#860, @alex-treebeard)
- Remove unused fields from KubeRay operator and RayCluster charts (#839, @kevin85421)
- [Bug] Remove an unused field (ingress.enabled) from KubeRay operator chart (#812, @kevin85421)
- [helm] Add memory limits and resource documentation. (#789, @DmitriGekhtman)
CI
- [Feature] Add python client test to action (#993, @jasoonn)
- [CI][Buildkite] Fix the PATH issue (#952, @kevin85421)
- [CI][Buildkite] An example test for Buildkite (#919, @kevin85421)
- refactor: Fix flaky tests by using RetryOnConflict (#904, @Yicheng-Lu-llll)
- Use k8sClient from client.New in controller test (#898, @Yicheng-Lu-llll)
- [Bug] Fix flaky test: should be able to update all Pods to Running (#893, @kevin85421)
- Enable test framework to install operator with custom config and put operator in a namespace with enforced PSS in security testing (#876, @Yicheng-Lu-llll)
- Ensure all temp files are deleted after the compatibility test (#886, @Yicheng-Lu-llll)
- Adding a test for the document for the Pod security standard (#866, @Yicheng-Lu-llll)
- [Feature] Run config tests with the latest release of KubeRay operator (#858, @kevin85421)
- [Feature] Define a general-purpose cleanup method for CREvent (#849, @kevin85421)
- [Feature] Remove Docker container and NodePort from compatibility test (#844, @kevin85421)
- Remove Docker from BasicRayTestCase (#840, @kevin85421)
- [Feature] Move some functions from prototype test framework to a new utils file (#837, @kevin85421)
- [CI] Add workflow to manually trigger release image push (#801, @DmitriGekhtman)
- [CI] Pin go version in CRD consistency check (#794, @DmitriGekhtman)
- [Feature] Improve the observability of integration tests (#775, @jasoonn)
Sample YAML files
- Improve ray-cluster.external-redis.yaml (#986, @Yicheng-Lu-llll)
- remove ray-cluster.getting-started.yaml (#987, @Yicheng-Lu-llll)
- [Feature] Read Redis password from Kubernetes Secret (#950, @kevin85421)
- [Ray 2.3.0] Update --redis-password for RayCluster (#929, @kevin85421)
- [Bug] KubeRay does not work on M1 macs. (#869, @kevin85421)
- [Post Ray 2.3 Release] Update Ray versions to Ray 2.3.0 (#925, @cadedaniel)
- [Post Ray...
v0.4.0 release
Highlights
The KubeRay 0.4.0 release includes the following improvements.
- Integrations for the MCAD and Volcano batch scheduling systems.
- Stable Helm support for the KubeRay Operator, KubeRay API Server, and Ray clusters. These charts are now hosted at a Helm repo.
- Critical stability improvements to the Ray Autoscaler integration. (To benefit from these improvements, use KubeRay >=0.4.0 and Ray >=2.2.0.)
- Numerous improvements to CI, tests, and developer workflows; a new configuration test framework.
- Numerous improvements to documentation.
- Bug fixes for alpha features, such as RayJobs and RayServices.
- Various improvements and bug fixes for the core RayCluster controller.
Contributors
The following individuals contributed to KubeRay 0.4.0. This list is alphabetical and incomplete.
@AlessandroPomponio @architkulkarni @Basasuya @DmitriGekhtman @IceKhan13 @asm582 @davidxia @dhaval0108 @haoxins @iycheng @jasoonn @Jeffwan @jianyuan @kaushik143 @kevin85421 @lizzzcai @orcahmlee @pcmoritz @peterghaddad @rafvasq @scarlet25151 @shrekris-anyscale @sigmundv @sihanwang41 @simon-mo @tbabej @tgaddair @ulfox @wilsonwang371 @wuisawesome
New features and integrations
- [Feature] Support Volcano for batch scheduling (#755, @tgaddair)
- kuberay int with MCAD (#598, @asm582)
Helm
These changes pertain to KubeRay's Helm charts.
- [Bug] Remove an unused field (ingress.enabled) from KubeRay operator chart (#812, @kevin85421)
- [helm] Add memory limits and resource documentation. (#789, @DmitriGekhtman)
- [Helm] Expose security context in helm chart. (#773, @DmitriGekhtman)
- [Helm] Clean up RayCluster Helm chart ahead of KubeRay 0.4.0 release (#751, @DmitriGekhtman)
- [Feature] Expose initContainer image in RayCluster chart (#674, @kevin85421)
- [Feature][Helm] Expose the autoscalerOptions (#666, @orcahmlee)
- [Feature][Helm] Align the key of minReplicas and maxReplicas (#663, @orcahmlee)
- Helm: add service type configuration to head group for ray-cluster (#614, @IceKhan13)
- Allow annotations in ray cluster helm chart (#574, @sigmundv)
- [Feature][Helm] Enable sidecar configuration in Helm chart (#604, @kevin85421)
- [bugfix][apiserver helm]: Adding missing rbacenable value (#594, @dhaval0108)
- [Bug] Modification of nameOverride will cause label selector mismatch for head node (#572, @kevin85421)
- [Helm][minor] Make "disabled" flag for worker groups optional (#548, @kevin85421)
- helm: Uncomment the disabled key for the default workergroup (#543, @tbabej)
- Fix Helm chart default configuration (#530, @kevin85421)
- helm-chart/ray-cluster: Allow setting pod lifecycle (#494, @ulfox)
CI
The changes in this section pertain to KubeRay CI, testing, and developer workflows.
- [Feature] Improve the observability of integration tests (#775, @jasoonn)
- [CI] Pin go version in CRD consistency check (#794, @DmitriGekhtman)
- [Feature] Test sample RayService YAML to catch invalid or out of date one (#731, @jasoonn)
- Replace kubectl wait command with RayClusterAddCREvent (#705, @kevin85421)
- [Feature] Test sample RayCluster YAMLs to catch invalid or out of date ones (#678, @kevin85421)
- [Bug] Misuse of Docker API and misunderstanding of Ray HA cause test_ray_serve flaky (#650, @jasoonn)
- Configuration Test Framework Prototype (#605, @kevin85421)
- Update tests for better Mac M1 compatibility (#654, @shrekris-anyscale)
- [Bug] Update wait function in test_detached_actor (#635, @kevin85421)
- [Bug] Misuse of Docker API and misunderstanding of Ray HA cause test_detached_actor flaky (#619, @kevin85421)
- [Feature] Docker support for chart-testing (#623, @jasoonn)
- [Feature] Optimize the wait functions in E2E tests (#609, @kevin85421)
- [Feature] Running end-to-end tests on local machine (#589, @kevin85421)
- [CI]use fixed version of gofumpt (#596, @wilsonwang371)
- update test files before separating them (#591, @wilsonwang371)
- Add reminders to avoid RBAC synchronization bug (#576, @kevin85421)
- [Feature] Consistency check for RBAC (#577, @kevin85421)
- [Feature] Sync for manifests and helm chart (#564, @kevin85421)
- [Feature] Add a chart-test script to enable chart lint error reproduction on laptop (#563, @kevin85421)
- [Feature] Add helm lint check in Github Actions (#554, @kevin85421)
- [Feature] Add consistency check for types.go, CRDs, and generated API in GitHub Actions (#546, @kevin85421)
- support ray 2.0.0 in compatibility test (#508, @wilsonwang371)
KubeRay Operator deployment
The changes in this section pertain to deployment of the KubeRay Operator.
- Fix finalizer typo and re-create manifests (#631, @AlessandroPomponio)
- Change Kuberay operator Deployment strategy type to Recreate (#566, @haoxins)
- [Bug][Doc] Increase default operator resource requirements, improve docs (#727, @kevin85421)
- [Feature] Sync logs to local file (#632, @Basasuya)
- [Bug] label rayNodeType is useless (#698, @kevin85421)
- Revise sample configs, increase memory requests, update Ray versions (#761, @DmitriGekhtman)
RayCluster controller
The changes in this section pertain to the RayCluster controller sub-component of the KubeRay Operator.
- [autoscaler] Expose autoscaler container security context. (#752, @DmitriGekhtman)
- refactor: log more descriptive info from initContainer (#526, @davidxia)
- [Bug] Fail to create ingress due to the deprecation of the ingress.class annotation (#646, @kevin85421)
- [kuberay] Fix inconsistent RBAC truncation for autoscaling clusters. (#689, @DmitriGekhtman)
- [raycluster controller] Always honor maxReplicas (#662, @DmitriGekhtman)
- [Autoscaler] Pass pod name to autoscaler, add pod patch permission (#740, @DmitriGekhtman)
- [Bug] Shallow copy causes different worker configurations (#714, @kevin85421)
- Fix duplicated volume issue (#690, @wilsonwang371)
- [fix][raycluster controller] No error if head ip cannot be determined. (#701, @DmitriGekhtman)
- [Feature] Set default appProtocol for Ray head service to tcp (#668, @kevin85421)
- [Telemetry] Inject env identifying KubeRay. (#562, @DmitriGekhtman)
- fix: correctly set GPUs in rayStartParams (#497, @davidxia)
- [operator] enable bashrc before container start (#427, @Basasuya)
- ...