codeflare-operator

Operator for installation and lifecycle management of CodeFlare distributed workload stack, starting with MCAD and InstaScale

CodeFlare Stack Compatibility Matrix

Component	Version
CodeFlare Operator	v1.0.0-rc.1
Multi-Cluster App Dispatcher	v1.35.0
CodeFlare-SDK	v0.8.0
InstaScale	v0.0.9
KubeRay	v0.5.0

Development

Requirements:

GNU sed - sed is used in several Makefile command. Using macOS default sed is incompatible, so GNU sed is needed for correct execution of these commands. When you have a version of the GNU sed installed on a macOS you may specify the binary using
```
# brew install gnu-sed
make install -e SED=/usr/local/bin/gsed
```

Testing

The e2e tests can be executed locally by running the following commands:

Use an existing cluster, or set up a test cluster, e.g.:
```
# Create a KinD cluster
make kind-e2e
# Install the CRDs
make install
```
[!NOTE] Some e2e tests cover the access to services via Ingresses, as end-users would do, which requires access to the Ingress controller load balancer by its IP. For it to work on macOS, this requires installing docker-mac-net-connect.
Start the operator locally:
```
NAMESPACE=default make run
```
Alternatively, You can run the operator from your IDE / debugger.
Set up the test CodeFlare stack:
```
make setup-e2e
```
[!NOTE] In OpenShift the KubeRay operator pod gets random user assigned. This user is then used to run Ray cluster. However the random user assigned by OpenShift doesn't have rights to store dataset downloaded as part of test execution, causing tests to fail. To prevent this failure on OpenShift user should enforce user 1000 for KubeRay and Ray cluster by creating this SCC in KubeRay operator namespace (replace the namespace placeholder):
```
kind: SecurityContextConstraints
apiVersion: security.openshift.io/v1
metadata:
  name: run-as-ray-user
seLinuxContext:
  type: MustRunAs
runAsUser:
  type: MustRunAs
  uid: 1000
users:
  - 'system:serviceaccount:$(namespace):kuberay-operator'
```
In a separate terminal, set your output directory for test files, and run the e2e suite:
```
export CODEFLARE_TEST_OUTPUT_DIR=<your_output_directory>
```
```
make test-e2e
```

Alternatively, You can run the e2e test(s) from your IDE / debugger.

Release

Invoke project-codeflare-release.yaml
Once all jobs within the action are completed, verify that compatibility matrix in README was properly updated.
Verify that opened pull request to OpenShift community operators repository has proper content.
Once PR is merged, announce the new release in slack and mail lists, if any.
Update the Distributed Workloads component in ODH (also copy/update the compatibility matrix). This may require yaml and test updates depending on the release. Make sure to create a tag + release in the Distributed Workloads repository that matches the project-codeflare release version.
Update the readme/markdown/yaml in odh-manifests as required.

Releases involving part of the stack

There may be instances in which a new CodeFlare stack release requires releases of only a subset of the stack components. Examples could be hotfixes for a specific component. In these instances:

Build updated components as needed:
- Build and release MCAD
- Build and release InstaScale
- Build and release CodeFlare-SDK
Invoke tag-and-build.yml GitHub action, this action will create a repository tag, build and push operator image.
Check result of tag-and-build.yml GitHub action, it should pass.
Verify that compatibility matrix in README was properly updated.
Follow the steps 3-6 from the previous section.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

codeflare-operator

Development

Testing

Release

Releases involving part of the stack

Files

README.md

Latest commit

History

README.md

File metadata and controls

codeflare-operator

Development

Testing

Release

Releases involving part of the stack