All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
The KubeRay 0.5.0 release includes the following improvements.
- Interact with KubeRay via a Python client
- Integrate KubeRay with Kubeflow to provide an interactive development environment (link).
- Integrate KubeRay with Ray TLS authentication
- Improve the user experience for KubeRay on AWS EKS (link)
- Fix some Kubernetes networking issues
- Fix some stability bugs in RayJob and RayService
The following individuals contributed to KubeRay 0.5.0. This list is alphabetical and incomplete.
@akanso @alex-treebeard @architkulkarni @cadedaniel @cskornel-doordash @davidxia @Dmitrigekhtman @ducviet00 @gvspraveen @harryge00 @jasoonn @Jeffwan @kevin85421 @psschwei @scarlet25151 @sihanwang41 @wilsonwang371 @Yicheng-lu-llll
- [Feature][Doc] Kubeflow integration (#937, @kevin85421)
- [Feature] Ray restricted podsecuritystandards for enterprise security and Kubeflow integration (#750, @kevin85421)
- [Feature] TLS authentication (#989, @kevin85421)
- [Feature][Doc] Access S3 bucket from Pods in EKS (#958, @kevin85421)
- Read cluster domain from resolv.conf or env (#951, @harryge00)
- [Feature] Replace service name with Fully Qualified Domain Name (#938, @kevin85421)
- [Feature] Add default init container in workers to wait for GCS to be ready (#973, @kevin85421)
- Fix issue with head pod not monitered by Prometheus under certain condition (#963, @Yicheng-Lu-llll)
- [Feature] Improve and fix Prometheus & Grafana integrations (#895, @kevin85421)
- Add example and tutorial to explain how to create custom metrics for Prometheus (#914, @Yicheng-Lu-llll)
- feat: enrich
kubectl get
output (#878, @davidxia)
- Fix issue with operator OOM restart (#946, @wilsonwang371)
- [Feature][Hotfix] Add observedGeneration to the status of CRDs (#979, @kevin85421)
- Customize the Prometheus export port (#954, @Yicheng-Lu-llll)
- [Feature] The default ImagePullPolicy should be IfNotPresent (#947, @kevin85421)
- Inject the --block option to ray start command automatically (#932, @Yicheng-Lu-llll)
- Inject cluster name as an environment variable into head and worker pods (#934, @Yicheng-Lu-llll)
- Ensure container ports without names are also included in the head node service (#891, @Yicheng-Lu-llll)
- fix:
.status.availableWorkerReplicas
(#887, @davidxia) - fix: only filter RayCluster events for reconciliation (#882, @davidxia)
- refactor: remove redundant import in
raycluster_controller.go
(#884, @davidxia) - refactor: use equivalent, shorter
Builder.Owns()
method (#881, @davidxia) - [RayCluster controller] [Bug] Unconditionally reconcile RayCluster every 60s instead of only upon change (#850, @architkulkarni)
- [Feature] Make head serviceType optional (#851, @kevin85421)
- [RayCluster controller] Add headServiceAnnotations field to RayCluster CR (#841, @cskornel-doordash)
- [Hotfix][release blocker][RayJob] HTTP client from submitting jobs before dashboard initialization completes (#1000, @kevin85421)
- [RayJob] Propagate error traceback string when GetJobInfo doesn't return valid JSON (#943, @architkulkarni)
- [RayJob][Doc] Fix RayJob sample config. (#807, @DmitriGekhtman)
- [RayService] Skip update events without change (#811, @sihanwang41)
- Add rayVersion in the RayCluster chart (#975, @Yicheng-Lu-llll)
- [Feature] Support environment variables for KubeRay operator chart (#978, @kevin85421)
- [Feature] Add service account section in helm chart (#969, @ducviet00)
- Update apiserver chart location in readme (#896, @psschwei)
- add sidecar container option (#920, @akihikokuroda)
- match selector of service to pod labels (#918, @akihikokuroda)
- [Feature] Nodeselector/Affinity/Tolerations value to kuberay-apiserver chart (#879, @alex-treebeard)
- [Feature] Enable namespaced installs via helm chart (#860, @alex-treebeard)
- Remove unused fields from KubeRay operator and RayCluster charts (#839, @kevin85421)
- [Bug] Remove an unused field (ingress.enabled) from KubeRay operator chart (#812, @kevin85421)
- [helm] Add memory limits and resource documentation. (#789, @DmitriGekhtman)
- [Feature] Add python client test to action (#993, @jasoonn)
- [CI][Buildkite] Fix the PATH issue (#952, @kevin85421)
- [CI][Buildkite] An example test for Buildkite (#919, @kevin85421)
- refactor: Fix flaky tests by using RetryOnConflict (#904, @Yicheng-Lu-llll)
- Use k8sClient from client.New in controller test (#898, @Yicheng-Lu-llll)
- [Bug] Fix flaky test: should be able to update all Pods to Running (#893, @kevin85421)
- Enable test framework to install operator with custom config and put operator in a namespace with enforced PSS in security testing (#876, @Yicheng-Lu-llll)
- Ensure all temp files are deleted after the compatibility test (#886, @Yicheng-Lu-llll)
- Adding a test for the document for the Pod security standard (#866, @Yicheng-Lu-llll)
- [Feature] Run config tests with the latest release of KubeRay operator (#858, @kevin85421)
- [Feature] Define a general-purpose cleanup method for CREvent (#849, @kevin85421)
- [Feature] Remove Docker container and NodePort from compatibility test (#844, @kevin85421)
- Remove Docker from BasicRayTestCase (#840, @kevin85421)
- [Feature] Move some functions from prototype test framework to a new utils file (#837, @kevin85421)
- [CI] Add workflow to manually trigger release image push (#801, @DmitriGekhtman)
- [CI] Pin go version in CRD consistency check (#794, @DmitriGekhtman)
- [Feature] Improve the observability of integration tests (#775, @jasoonn)
- Improve ray-cluster.external-redis.yaml (#986, @Yicheng-Lu-llll)
- remove ray-cluster.getting-started.yaml (#987, @Yicheng-Lu-llll)
- [Feature] Read Redis password from Kubernetes Secret (#950, @kevin85421)
- [Ray 2.3.0] Update --redis-password for RayCluster (#929, @kevin85421)
- [Bug] KubeRay does not work on M1 macs. (#869, @kevin85421)
- [Post Ray 2.3 Release] Update Ray versions to Ray 2.3.0 (#925, @cadedaniel)
- [Post Ray 2.2.0 Release] Update Ray versions to Ray 2.2.0 (#822, @DmitriGekhtman)
- Update contribution doc to show users how to reach out via slack (#936, @gvspraveen)
- [Feature][Docs] Explain how to specify container command for head pod (#912, @kevin85421)
- [post-0.4.0 KubeRay release] update proto version to 0.4.0 (#830, @scarlet25151)
- [0.4.0 release] Update changelog for KubeRay 0.4.0 (#836, @DmitriGekhtman)
- [Docs] Revise release note docs (#835, @DmitriGekhtman)
- [release] Add release command and guidance for KubeRay cli (#834, @Jeffwan)
- [Release] Add tools and docs for changelog generator (#833, @Jeffwan)
- [Bug] error: git cmd when following docs (#831, @kevin85421)
- [post-0.4.0 KubeRay release] Update KubeRay versions (#821, @DmitriGekhtman)
- [Feature][Doc] End-to-end KubeRay operator development process on Kind (#826, @kevin85421)
- [Release][Docs] Update release instructions (#819, @DmitriGekhtman)
- [docs] Tweaks to main README, add basic API Server README. (#809, @DmitriGekhtman)
- update docs for release v0.4.0 (#778, @scarlet25151)
- [docs] Update KubeRay operator README. (#808, @DmitriGekhtman)
- [Release] Update docs for release v0.4.0 (#779, @kevin85421)
The KubeRay 0.4.0 release includes the following improvements.
- Integrations for the MCAD and Volcano batch scheduling systems.
- Stable Helm support for the KubeRay Operator, KubeRay API Server, and Ray clusters. These charts are now hosted at a Helm repo.
- Critical stability improvements to the Ray Autoscaler integration. (To benefit from these improvements, use KubeRay >=0.4.0 and Ray >=2.2.0.)
- Numerous improvements to CI, tests, and developer workflows; a new configuration test framework.
- Numerous improvements to documentation.
- Bug fixes for alpha features, such as RayJobs and RayServices.
- Various improvements and bug fixes for the core RayCluster controller.
The following individuals contributed to KubeRay 0.4.0. This list is alphabetical and incomplete.
@AlessandroPomponio @architkulkarni @Basasuya @DmitriGekhtman @IceKhan13 @asm582 @davidxia @dhaval0108 @haoxins @iycheng @jasoonn @Jeffwan @jianyuan @kaushik143 @kevin85421 @lizzzcai @orcahmlee @pcmoritz @peterghaddad @rafvasq @scarlet25151 @shrekris-anyscale @sigmundv @sihanwang41 @simon-mo @tbabej @tgaddair @ulfox @wilsonwang371 @wuisawesome
- [Feature] Support Volcano for batch scheduling (#755, @tgaddair)
- kuberay int with MCAD (#598, @asm582)
These changes pertain to KubeRay's Helm charts.
- [Bug] Remove an unused field (ingress.enabled) from KubeRay operator chart (#812, @kevin85421)
- [helm] Add memory limits and resource documentation. (#789, @DmitriGekhtman)
- [Helm] Expose security context in helm chart. (#773, @DmitriGekhtman)
- [Helm] Clean up RayCluster Helm chart ahead of KubeRay 0.4.0 release (#751, @DmitriGekhtman)
- [Feature] Expose initContainer image in RayCluster chart (#674, @kevin85421)
- [Feature][Helm] Expose the autoscalerOptions (#666, @orcahmlee)
- [Feature][Helm] Align the key of minReplicas and maxReplicas (#663, @orcahmlee)
- Helm: add service type configuration to head group for ray-cluster (#614, @IceKhan13)
- Allow annotations in ray cluster helm chart (#574, @sigmundv)
- [Feature][Helm] Enable sidecar configuration in Helm chart (#604, @kevin85421)
- [bugfix][apiserver helm]: Adding missing rbacenable value (#594, @dhaval0108)
- [Bug] Modification of nameOverride will cause label selector mismatch for head node (#572, @kevin85421)
- [Helm][minor] Make "disabled" flag for worker groups optional (#548, @kevin85421)
- helm: Uncomment the disabled key for the default workergroup (#543, @tbabej)
- Fix Helm chart default configuration (#530, @kevin85421)
- helm-chart/ray-cluster: Allow setting pod lifecycle (#494, @ulfox)
The changes in this section pertain to KubeRay CI, testing, and developer workflows.
- [Feature] Improve the observability of integration tests (#775, @jasoonn)
- [CI] Pin go version in CRD consistency check (#794, @DmitriGekhtman)
- [Feature] Test sample RayService YAML to catch invalid or out of date one (#731, @jasoonn)
- Replace kubectl wait command with RayClusterAddCREvent (#705, @kevin85421)
- [Feature] Test sample RayCluster YAMLs to catch invalid or out of date ones (#678, @kevin85421)
- [Bug] Misuse of Docker API and misunderstanding of Ray HA cause test_ray_serve flaky (#650, @jasoonn)
- Configuration Test Framework Prototype (#605, @kevin85421)
- Update tests for better Mac M1 compatibility (#654, @shrekris-anyscale)
- [Bug] Update wait function in test_detached_actor (#635, @kevin85421)
- [Bug] Misuse of Docker API and misunderstanding of Ray HA cause test_detached_actor flaky (#619, @kevin85421)
- [Feature] Docker support for chart-testing (#623, @jasoonn)
- [Feature] Optimize the wait functions in E2E tests (#609, @kevin85421)
- [Feature] Running end-to-end tests on local machine (#589, @kevin85421)
- [CI]use fixed version of gofumpt (#596, @wilsonwang371)
- update test files before separating them (#591, @wilsonwang371)
- Add reminders to avoid RBAC synchronization bug (#576, @kevin85421)
- [Feature] Consistency check for RBAC (#577, @kevin85421)
- [Feature] Sync for manifests and helm chart (#564, @kevin85421)
- [Feature] Add a chart-test script to enable chart lint error reproduction on laptop (#563, @kevin85421)
- [Feature] Add helm lint check in Github Actions (#554, @kevin85421)
- [Feature] Add consistency check for types.go, CRDs, and generated API in GitHub Actions (#546, @kevin85421)
- support ray 2.0.0 in compatibility test (#508, @wilsonwang371)
The changes in this section pertain to deployment of the KubeRay Operator.
- Fix finalizer typo and re-create manifests (#631, @AlessandroPomponio)
- Change Kuberay operator Deployment strategy type to Recreate (#566, @haoxins)
- [Bug][Doc] Increase default operator resource requirements, improve docs (#727, @kevin85421)
- [Feature] Sync logs to local file (#632, @Basasuya)
- [Bug] label rayNodeType is useless (#698, @kevin85421)
- Revise sample configs, increase memory requests, update Ray versions (#761, @DmitriGekhtman)
The changes in this section pertain to the RayCluster controller sub-component of the KubeRay Operator.
- [autoscaler] Expose autoscaler container security context. (#752, @DmitriGekhtman)
- refactor: log more descriptive info from initContainer (#526, @davidxia)
- [Bug] Fail to create ingress due to the deprecation of the ingress.class annotation (#646, @kevin85421)
- [kuberay] Fix inconsistent RBAC truncation for autoscaling clusters. (#689, @DmitriGekhtman)
- [raycluster controller] Always honor maxReplicas (#662, @DmitriGekhtman)
- [Autoscaler] Pass pod name to autoscaler, add pod patch permission (#740, @DmitriGekhtman)
- [Bug] Shallow copy causes different worker configurations (#714, @kevin85421)
- Fix duplicated volume issue (#690, @wilsonwang371)
- [fix][raycluster controller] No error if head ip cannot be determined. (#701, @DmitriGekhtman)
- [Feature] Set default appProtocol for Ray head service to tcp (#668, @kevin85421)
- [Telemetry] Inject env identifying KubeRay. (#562, @DmitriGekhtman)
- fix: correctly set GPUs in rayStartParams (#497, @davidxia)
- [operator] enable bashrc before container start (#427, @Basasuya)
- [Bug] Pod reconciliation fails if worker pod name is supplied (#587, @kevin85421)
The changes pertain to the RayJob controller sub-component of the KubeRay Operator.
- [Feature] [RayJobs] Use finalizers to implement stopping a job upon cluster deletion (#735, @kevin85421)
- [ray job] support stop job after job cr is deleted in cluster selector mode (#629, @Basasuya)
- [RayJob] Fix example misconfiguration. (#602, @DmitriGekhtman)
- [operator] support clusterselector in job crd (#470, @Basasuya)
The changes pertain to the RayService controller sub-component of the KubeRay Operator.
- [RayService] Skip update events without change (#811, @sihanwang41)
- [RayService] Track whether Serve app is ready before switching clusters (#730, @shrekris-anyscale)
- [RayService] Compare cached hashed config before triggering update (#655, @shrekris-anyscale)
- Disable async serve handler in Ray Service cluster. (#447, @iycheng)
- [RayService] Revert "Disable async serve handler in Ray Service cluster (#447)" (#606, @shrekris-anyscale)
- add support for rayserve in apiserver (#456, @scarlet25151)
- Fix initial health check not obeying deploymentUnhealthySecondThreshold (#540, @jianyuan)
- [Bug][apiserver] fix apiserver create rayservice missing serve port (#734, @scarlet25151)
- Support updating RayServices using the KubeRay API Server (#633, @scarlet25151)
- [api server] enable job spec server (#416, @Basasuya)
- [Bug] client_golang used by KubeRay has a vulnerability (#728, @kevin85421)
- feat: update RayCluster
.status.reason
field with pod creation error (#639, @davidxia) - feat: enrich RayCluster status with head IPs (#468, @davidxia)
- config/prometheus: add metrics exporter for workers (#469, @ulfox)
- [docs] Updated Volcano integration documentation (#776, @tgaddair)
- [0.4.0 Release] Minor doc improvements (#780, @DmitriGekhtman)
- Update gcs-ft.md (#777, @wilsonwang371)
- [Feature] Refactor test framework & test kuberay-operator chart with configuration framework (#759, @kevin85421)
- fix docs: typo in README.md (#760, @davidxia)
- [APIServer][Docs] Identify API server as community-managed and optional (#753, @DmitriGekhtman)
- Add documentations for the release process of Helm charts (#723, @kevin85421)
- [docs] Fix markdown in ray services (#712, @lizzzcai)
- Cross-reference docs. (#703, @DmitriGekhtman)
- Adding example of manually setting up NGINX Ingress (#699, @jasoonn)
- [docs] State version requirement for kubectl (#702, @DmitriGekhtman)
- Remove ray-cluster.without-block.yaml (#675, @kevin85421)
- [doc] Add instructions about how to use SSL/TLS for redis connection. (#652, @iycheng)
- [Feature][Docs] AWS Application Load Balancer (ALB) support (#658, @kevin85421)
- [Feature][Doc] Explain that RBAC should be synchronized manually (#641, @kevin85421)
- [doc] Reformat README.md (#599, @rafvasq)
- [doc] Copy-Edit RayJob (#608, @rafvasq)
- [doc] VS Code IDE setup (#613, @kevin85421)
- [doc] Copy-Edit RayService (#607, @rafvasq)
- fix mkdocs URL (#600, @asm582)
- [doc] Add a tip on docker images (#586, @DmitriGekhtman)
- Update ray-operator documentation and image version in ray-cluster.heterogeneous.yaml (#585, @jasoonn)
- [Doc] Cannot build kuberay with Go 1.16 (#575, @kevin85421)
- docs: Add instructions for working with Argo CD (#535, @haoxins)
- Update Helm doc. (#531, @DmitriGekhtman)
- Failure happened when install operator with kubectl apply (#525, @kevin85421)
- fix examples: bad K8s log config causing logs to be lost (#501, @davidxia)
- Helm instructions: kubectl apply -> kubectl create (#505, @DmitriGekhtman)
- apiserver add new api docs (#498, @scarlet25151)
v0.3.0 (2022-08-17)
- [rayservice] Fix config names to match serve config format directly (#464, @edoakes)
- Disable pin on head for serve controller by default in service operator (#457, @iycheng)
- add wget timeout to probes (#448, @wilsonwang371)
- Disable async serve handler in Ray Service cluster. (#447, @iycheng)
- Add more env for RayService head or worker pods (#439, @brucez-anyscale)
- RayCluster created by RayService set death info env for ray container (#419, @brucez-anyscale)
- Add integration test for kuberay ray service and improve ray service operator (#415, @brucez-anyscale)
- Fix a potential reconcile issue for RayService and allow config unhealth time threshold in CR (#384, @brucez-anyscale)
- [Serve] Unify logger and add user facing events (#378, @simon-mo)
- Improve RayService Operator logic to handle head node crash (#376, @brucez-anyscale)
- Add serving service for users traffic with health check (#367, @brucez-anyscale)
- Create a service for dashboard agent (#324, @brucez-anyscale)
- Update RayService CR to integrate with Ray Nightly (#322, @brucez-anyscale)
- RayService: zero downtime update and healthcheck HA recovery (#307, @brucez-anyscale)
- RayService: Dev RayService CR and Controller logic (#287, @brucez-anyscale)
- KubeRay: kubebuilder creat RayService Controller and CR (#270, @brucez-anyscale)
- Properly convert unix time into meta time (#480, @pingsutw)
- Fix nil pointer dereference (#429, @pingsutw)
- Improve RayJob controller quality to alpha (#398, @Jeffwan)
- Submit ray job after cluster is ready (#405, @pingsutw)
- Add RayJob CRD and controller logic (#303, @harryge00)
- tune readiness probe timeouts (#411, @wilsonwang371)
- enable ray external storage namespace (#406, @wilsonwang371)
- Initial support for external Redis and GCS HA (#294, @wilsonwang371)
- [Autoscaler] Match autoscaler image to Ray head image for Ray >= 2.0.0 (#423, @DmitriGekhtman)
- [autoscaler] Better defaults and config options (#414, @DmitriGekhtman)
- [autoscaler] Make log file mount path more specific. (#391, @DmitriGekhtman)
- [autoscaler] Flip prioritize-workers-to-delete feature flag (#379, @DmitriGekhtman)
- Update autoscaler image (#371, @DmitriGekhtman)
- [minor] Update autoscaler image. (#313, @DmitriGekhtman)
- Provide override for autoscaler image pull policy. (#297, @DmitriGekhtman)
- [RFC][autoscaler] Add autoscaler container overrides and config options for scale behavior. (#278, @DmitriGekhtman)
- [autoscaler] Improve autoscaler auto-configuration, upstream recent improvements to Kuberay NodeProvider (#274, @DmitriGekhtman)
- correct gcs ha to gcs ft (#482, @wilsonwang371)
- Fix panic in cleanupInvalidVolumeMounts (#481, @MissiontoMars)
- fix: worker node can't connect to head node service (#445, @pingsutw)
- Add http resp code check for kuberay (#435, @brucez-anyscale)
- Fix wrong ray start command (#431, @pingsutw)
- fix controller: use Service's TargetPort (#383, @davidxia)
- Generate clientset for new specs (#392, @Basasuya)
- Add Ray address env. (#388, @DmitriGekhtman)
- Add the support to replace evicted head pod (#381, @Jeffwan)
- [Bug] Fix raycluster updatestatus list wrong label (#377, @scarlet25151)
- Make replicas optional for the head spec. (#362, @DmitriGekhtman)
- Add ray head service endpoints in status for expose raycluster's head node endpoints (#341, @scarlet25151)
- Support KubeRay management labels (#345, @Jeffwan)
- fix: bug in object store memory validation (#332, @davidxia)
- feat: add EventReason type for events (#334, @davidxia)
- minor refactor: fix camel-casing of unHealthy -> unhealthy (#333, @davidxia)
- refactor: remove redundant imports (#317, @davidxia)
- Fix GPU-autofill for rayStartParams (#328, @DmitriGekhtman)
- ray-operator: add missing space in controller log messages (#316, @davidxia)
- fix: use head group's ServiceAccount in autoscaler RoleBinding (#315, @davidxia)
- fix typos in comments and help messages (#304, @davidxia)
- enable force cluster upgrade (#231, @wilsonwang371)
- fix operator: correctly set head pod service account (#276, @davidxia)
- [hotfix] Fix Service account typo (#285, @DmitriGekhtman)
- Rename RayCluster folder to Ray since the group is Ray (#275, @brucez-anyscale)
- KubeRay: Relocate files to enable controller extension with Kubebuilder (#268, @brucez-anyscale)
- fix: use configured RayCluster service account when autoscaling (#259, @davidxia)
- suppress not found errors into regular logs (#222, @akanso)
- adding label check (#221, @akanso)
- Prioritize WorkersToDelete (#208, @sriram-anyscale)
- Simplify k8s client creation (#179, @chenk008)
- [ray-operator]Make log timestamp readable (#206, @chenk008)
- bump controller-runtime to 0.11.1 and Kubernetes to v1.23 (#180, @chenk008)
- Add envs in cluster service api (#432, @MissiontoMars)
- Expose swallowed detail error messages (#422, @Jeffwan)
- fix: typo RAY_DISABLE_DOCKER_CPU_WRARNING -> RAY_DISABLE_DOCKER_CPU_WARNING (#421, @pingsutw)
- Add hostPathType and mountPropagationMode field for apiserver (#413, @scarlet25151)
- Fix
ListAllComputeTemplates
proto comments (#407, @MissiontoMars) - Enable DefaultHTTPErrorHandler and Upgrade grpc-gateway to v2 (#369, @Jeffwan)
- Validate namespace consistency in the request when creating the cluster and the compute template (#365, @daikeshi)
- Update compute template service url to include namespace path param (#363, @Jeffwan)
- fix apiserver created raycluster metrics port missing and check (#356, @scarlet25151)
- Support mounting volumes in API request (#346, @Jeffwan)
- add standard label for the filtering of cluster (#342, @scarlet25151)
- expose kubernetes events in apiserver (#343, @scarlet25151)
- Update ray-operator version in the apiserver (#340, @pingsutw)
- fix: typo worker_group_sepc -> worker_group_spec (#330, @davidxia)
- Fix gpu-accelerator in template (#296, @armandpicard)
- Add namespace scope to compute template operations (#244, @daikeshi)
- Add namespace scope to list operation (#237, @daikeshi)
- Add namespace scope for Ray cluster get and delete operations (#229, @daikeshi)
- Cli: make namespace optional to adapt to ListAll operation (#361, @Jeffwan)
- sync up helm chart's role (#472, @scarlet25151)
- helm-charts/ray-cluster: Allow extra workers (#451, @ulfox)
- Update helm chart version to 0.3.0 (#461, @Jeffwan)
- helm-chart/ray-cluster: allow head autoscaling (#443, @ulfox)
- modify kuberay operator crds in kuberay operator chart and add apiserver chart (#354, @scarlet25151)
- Warn explicitly against using kubectl apply to create RayCluster CRD. (#302, @DmitriGekhtman)
- Sync crds to Helm chart (#280, @haoxins)
- [Feature]Run kuberay in a single namespace (#258, @wilsonwang371)
- fix duplicated port config and manager.yaml missing config (#250, @wilsonwang371)
- manifests: Add live/ready probes (#243, @haoxins)
- Helm: supports custom probe seconds (#239, @haoxins)
- Add CD for helm charts (#199, @ddelange)
- Enable docker image push for release-0.3 branch (#462, @Jeffwan)
- add new 8000 port forwarding in kind (#424, @wilsonwang371)
- improve compatibility test stability (#418, @wilsonwang371)
- improve test stability (#394, @wilsonwang371)
- use more strict formatting (#385, @wilsonwang371)
- fix flaky test issue (#370, @wilsonwang371)
- provide more detailed information in case of test failures (#352, @wilsonwang371)
- fix wrong kuberay image used by compatibility test (#327, @wilsonwang371)
- add cluster nodes info test (#299, @wilsonwang371)
- Fix the image name in deploy cmd (#293, @brucez-anyscale)
- [CI]enable ci test to check ctrl plane health state (#279, @wilsonwang371)
- [bugfix]update flaky test timeout (#254, @wilsonwang371)
- Update format by running gofumpt (#236, @wilsonwang371)
- Add unit tests for raycluster_controller reconcilePods function (#219, @Waynegates)
- Support ray 1.12 (#245, @wilsonwang371)
- add 1.11 to compatibility test and update comment (#217, @wilsonwang371)
- run compatibility in parallel using multiple workflows (#215, @wilsonwang371)
- add-state-machine-and-exposing-port (#319, @scarlet25151)
- Install: Fix directory path for prometheus install.sh (#256, @Tomcli)
- Fix Ray Operator prometheus config (#253, @Tomcli)
- Emit prometheus metrics from kuberay control plane (#232, @Jeffwan)
- Enable metrics-export-port by default and configure prometheus monitoring (#230, @scarlet25151)
- [doc] Config and doc updates ahead of KubeRay 0.3.0/Ray 2.0.0 (#486, @DmitriGekhtman)
- document the raycluster status (#473, @scarlet25151)
- Clean up example samples (#434, @DmitriGekhtman)
- Add ray state api doc link in ray service doc (#428, @brucez-anyscale)
- [docs] Add sample configs with larger Ray pods (#426, @DmitriGekhtman)
- Add RayJob docs and development docs (#404, @Jeffwan)
- Add gcs ha doc into mkdocs (#402, @brucez-anyscale)
- [minor] Add client and dashboard ports to ports in example configs. (#399, @DmitriGekhtman)
- Add documentation for RayService (#387, @brucez-anyscale)
- Fix broken links by creating referenced soft links (#335, @Jeffwan)
- Support hosting swagger ui in apiserver (#344, @Jeffwan)
- Remove autoscaler debug example to prevent confusion (#326, @Jeffwan)
- Add a link to protobuf-grpc-service design page in proto doc (#310, @yabuchan)
- update readme and address issue #286 (#311, @wilsonwang371)
- docs fix: specify only Go 1.16 or 1.17 works right now (#261, @davidxia)
- Add documention link in readme (#247, @simon-mo)
- Use mhausenblas/mkdocs-deploy-gh-pages action for docs (#233, @Jeffwan)
- Build KubeRay Github site (#216, @Jeffwan)
v0.2.0 (2022-03-13)
- Support envFrom in rayclusters deployed with Helm (#183, @ebr)
- Helm: support imagePullSecrets for ray clusters (#182, @ebr)
- Support scheduling constraints in Helm-deployed clusters (#181, @ebr)
- Helm: ensure RBAC rules are up to date with the latest autogenerated manifest (#175, @ebr)
- add resource command (#170, @zhuangzhuang131419)
- Use container to generate proto files (#160, @Jeffwan)
- Support in-tree autoscaler (#163, @Jeffwan)
- [CLI] check viper error (#172, @chenk008)
- [Feature]Add subcommand
--version
(#166, @chenk008) - [Feature] Add flag
watch-namespace
(#165, @chenk008) - Support enableIngress for RayCluster (#38, @Jeffwan)
- Add CRD verb permission in helm (#144, @chenk008)
- Add quick start deployment manifests (#132, @Jeffwan)
- Add CLI to kuberay (#135, @wolfsniper2388)
- Ray Operator: Upgrade to Go v1.17 (#128, @haoxins)
- Add deploy manifests for apiserver (#119, @Jeffwan)
- Implement resource manager and gRPC services (#127, @Jeffwan)
- Generate go clients and swagger files (#126, @Jeffwan)
- [service] Init backend service project (#113, @Jeffwan)
- Add gRPC service definition and gRPC gateway (#112, @Jeffwan)
- [proto] Add core api definitions as protobuf message (#93, @Jeffwan)
- Use ray start block in Pod's entrypoint (#77, @chenk008)
- Add generated clientsets, informers and listers (#97, @Jeffwan)
- Add codegen scripts and make required api changes (#96, @harryge00)
- Reorganize api folder for code generation (#91, @harryge00)
- Fix serviceaccount typo in operator role (#188, @Jeffwan)
- Fix cli typo (#173, @chenk008)
- [Bug]Leader election need lease permission (#169, @chenk008)
- refactor: rename kubray -> kuberay (#145, @tekumara)
- Fix the Helm chart's image name (#130, @haoxins)
- fix typo in the helm chart templates (#129, @haoxins)
- fix issue that modifies the list while iterating through it (#125, @wilsonwang371)
- Add helm (#109, @zhuangzhuang131419)
- Update samples yaml (#102, @ryantd)
- fix missing template objectmeta (#95, @chenk008)
- fix typo in Readme (#81, @denkensk)
- kuberay compatibility test with ray (#157, @wilsonwang371)
- Setup ci for apiserver (#162, @Jeffwan)
- Enable gofmt and move goimports to linter job (#158, @Jeffwan)
- add more debug info for bug-150: goimport issue (#151, @wilsonwang371)
- add nightly docker build workflow (#141, @wilsonwang371)
- enable goimport and add new makefile target to only build image without test (#123, @wilsonwang371)
- [Feature]add docker build stage to ci workflow (#122, @wilsonwang371)
- Pass --timeout option to golangci-lint (#116, @Jeffwan)
- Add linter job for github workflow (#79, @feilengcui008)
- Add Makefile for cli project (#192, @Jeffwan)
- Manifests and docs improvement for prerelease (#191, @Jeffwan)
- Add documentation for autoscaling feature (#189, @Jeffwan)
- docs: Fix typo in best practice (#190, @nakamasato)
- add kuberay on kind jupyter notebook (#147, @wilsonwang371)
- Add KubeRay release guideline (#161, @Jeffwan)
- Add troubleshooting guide for ray version mismatch (#154, @scarlet25151)
- Explanation and Best Practice for workers-head Reconnection (#142, @nostalgicimp)
- [docs] Folder name change to kuberay-operator (#143, @asm582)
- Improve the Helm charts docs (#131, @haoxins)
- add auto-scale doc (#108, @akanso)
- Add core API and backend service design doc (#98, @Jeffwan)
- [Feature] add more options in bug template (#121, @wilsonwang371)
- Rename service module to apiserver (#118, @Jeffwan)
v0.1.0 (2021-10-16)
- Check duplicate services explicitly (#72, @Jeffwan)
- Expose reconcile concurrency as a command flag (#67, @feilengcui008)
- Ignore reconcile cluster being deleted (#63, @feilengcui008)
- Add issue and pr templates (#44, @chaomengyuan)
- Create root level .gitignore file (#37, @Jeffwan)
- Remove BAZEL build in ray-operator project (#32, @chenk008)
- Upgrade Kubebuilder to 3.0.0 and optimize Github workflow (#31, @Jeffwan)
- Update v1alpha1 RayCluster CRD and controllers (#22, @Jeffwan)
- Deprecate msft operator and rename to ray-operator (#20, @Jeffwan)
- Deprecate ByteDance operator and move to unified one (#19, @Jeffwan)
- Deprecate antgroup ray operator and move to unified implementation (#18, @chenk008)
- Upgrade to go 1.15 (#12, @tgaddair)
- Remove unused generated manifest from kubebuilder (#11, @Jeffwan)
- Clean up kustomization manifests (#10, @Jeffwan)
- Add RayCluster v1alpha1 controller (#8, @Jeffwan)
- Scaffolding out Bytedance's ray operator project (#7, @Jeffwan)
- allow deletion of workers (#5, @akanso)
- Ray Autoscaler integrate with Ray K8s Operator (#2, @Qstar)
- Add license (#3, @akanso)
- Operator with Design 1B (#1, @akanso)
- Fix flaky tests by retrying 409 conflict error (#73, @Jeffwan)
- Fix issues in heterogeneous sample (#45, @anencore94)
- Fix incorrect manifest setting and remove unused manifests (#34, @Jeffwan)
- Fix status update issue and redis port formatting issue (#16, @Jeffwan)
- Fix leader election failure and crd too long issue (#9, @Jeffwan)