Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(conditions): provide status information using conditions #336

Merged
merged 8 commits into from
Feb 15, 2022

Conversation

ebaron
Copy link
Member

@ebaron ebaron commented Jan 20, 2022

This PR adds a conditions field to the Cryostat CRD Status. So far there are 3 types of conditions the operator will report on:

  • Availability of the main deployment
  • Availability of the reports deployment
  • Setup of TLS using cert-manager

There are likely other things we can add here, but I think this is a good start. The standards doc says to consider the condition types as part of the API, so we should be careful about which types we add.

Examples in OpenShift Console:
Screenshot 2022-01-19 at 17-23-12 Red Hat OpenShift Container Platform
Screenshot 2022-01-20 at 17-25-29 Red Hat OpenShift Container Platform

Fixes: #203

@ebaron ebaron added the feat New feature or request label Jan 20, 2022
@hareetd
Copy link

hareetd commented Jan 21, 2022

I think the ReportsDeploymentAvailable example is a bit confusing. Would the Status set to False when the Reason is ReportsDeploymentDisabled make more sense? Or is it implying that the reports deployment is available if the user chooses to enable it, but currently it's disabled?

@andrewazores
Copy link
Member

I think the status/condition makes sense but agree that it's confusing - I think the naming/message is what is confusing, though. If the Cryostat CR is configured with 0 reports replicas, then the value that the ReportsDeploymentAvailable represents is trivially true, or at least should be identical to MainDeploymentAvailable at all times. But the name ReportsDeploymentAvailable seems misleading or confusing to be true when it's actually indicating that there is no reports deployment at all, and there does not need to be one.

Maybe ReportsDeploymentAvailable should just be renamed to something a little more high-level/generic, like ReportsOnline or ReportsAvailable. It can be true because the main Cryostat deployment will handle it, or because there are sidecar report generator containers to handle it and they are available/ready. If ReportsAvailable is false then that's either because the MainDeploymentAvailable is also false, or because there should be sidecar report generators but they are in some non-ready state themselves.

@jan-law
Copy link
Contributor

jan-law commented Jan 21, 2022

Adding onto this, it might be helpful to add a reason or message when ReportsAvailable is false, e.g. to indicate if the sidecar report replicas are scaling up or down.

@ebaron
Copy link
Member Author

ebaron commented Feb 2, 2022

It appears that the Deployment Progressing condition is used to show things like scaling up, while we're using the Available condition. I'm not sure if we want to add an additional condition per deployment, since the reports deployment may not even exist. I could use information from the Available condition when true, and Progressing condition when Available is false? I'm not sure if we'd lose any useful information by doing that. What do you think?

@andrewazores
Copy link
Member

I think that sounds okay... it seems like it would be encoding the same kind of information I was looking for in my last comment. I have no idea if this is standard practice or idiomatic for an operator/controller, though.

@ebaron
Copy link
Member Author

ebaron commented Feb 3, 2022

I think that sounds okay... it seems like it would be encoding the same kind of information I was looking for in my last comment. I have no idea if this is standard practice or idiomatic for an operator/controller, though.

There unfortunately seems to be very little standard practice for reporting status information of the operands. I have noticed some operators that include all conditions for the Deployment. There should be up to 3: Available, Progressing, and ReplicaFailure. The latter seems absent unless an error actually occurs. We could just include these, and if the reports deployment is disabled, skip those conditions.

@andrewazores
Copy link
Member

Okay, seems to make sense.

@ebaron
Copy link
Member Author

ebaron commented Feb 4, 2022

I've tried a couple of different approaches to include all of the listed Deployment Conditions,

The first one is to include the Deployment Conditions for each deployment as-is as their own entries in the Cryostat Status.

// +optional
// +operator-sdk:csv:customresourcedefinitions:type=status,displayName="Cryostat Conditions",xDescriptors={"urn:alm:descriptor:io.kubernetes.conditions"}
Conditions []metav1.Condition `json:"conditions,omitempty"`
// +optional
// +operator-sdk:csv:customresourcedefinitions:type=status,displayName="Main Deployment Conditions",xDescriptors={"urn:alm:descriptor:io.kubernetes.conditions"}
MainDeploymentConditions []appsv1.DeploymentCondition `json:"mainDeploymentConditions,omitempty"`
// +optional
// +operator-sdk:csv:customresourcedefinitions:type=status,displayName="Reports Deployment Conditions",xDescriptors={"urn:alm:descriptor:io.kubernetes.conditions"}
ReportsDeploymentConditions []appsv1.DeploymentCondition `json:"reportsDeploymentConditions,omitempty"`

This looks like the following in the Console UI:
Screenshot 2022-02-04 at 15-51-53 Red Hat OpenShift Container Platform

This is easiest to implement, and organizes the conditions nicely at the expense of extra screen real estate. One drawback however is the list of Cryostat CRs only considers Conditions that are under the property status.conditions, so this excludes the separate deployment conditions.

Screenshot 2022-02-04 at 15-51-34 cryostat-operator v2 1 0-dev · Details · Red Hat OpenShift Container Platform

The other approach is to combine the conditions together into one set of Conditions, which is similar to what I've done already. I've modified this to not include Conditions for the reports deployment if it hasn't been enabled.

// Conditions of the components managed by the Cryostat Operator
// +optional
// +operator-sdk:csv:customresourcedefinitions:type=status,displayName="Cryostat Conditions",xDescriptors={"urn:alm:descriptor:io.kubernetes.conditions"}
Conditions []metav1.Condition `json:"conditions,omitempty"`

(reports enabled)
Screenshot 2022-02-04 at 17-54-24 Red Hat OpenShift Container Platform
(reports disabled)
Screenshot 2022-02-04 at 18-09-35 Red Hat OpenShift Container Platform

This is a slightly more complicated implementation, but is more compact, and the full set of conditions are available in the list view.

Screenshot 2022-02-04 at 17-54-09 cryostat-operator v2 1 0-dev · Details · Red Hat OpenShift Container Platform

The issue with the Status field in the list view may be unimportant in the long run. Adding a state or phase string property to the CR Status supersedes the Conditions [1]. This property could show Ready or Running, which the operator can determine using information from the Conditions.

[1] https://github.com/openshift/console/blob/a32d7bf28ce8afda3fba461cb71ac5c5488018c0/frontend/packages/operator-lifecycle-manager/src/components/operand/index.tsx#L161-L195

@andrewazores
Copy link
Member

I think that second approach makes a lot of sense and the extra bit about the state or phase seems to fix the only notable downside with it that I'm seeing.

@ebaron
Copy link
Member Author

ebaron commented Feb 7, 2022

I think that second approach makes a lot of sense and the extra bit about the state or phase seems to fix the only notable downside with it that I'm seeing.

The second approach being one table of Conditions for everything, right? The Status in the Cryostat CR list is okay for that one. All of the Conditions with status true will appear there. In the first approach with multiple tables, the deployment-related conditions would not appear in that Status field, so it was incomplete.

@ebaron
Copy link
Member Author

ebaron commented Feb 11, 2022

I've finished implementing the second approach where all Deployment conditions are merged into the single Conditions property, and reports deployment conditions are omitted when not enabled. How does the code look now?

Copy link
Member

@andrewazores andrewazores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise, just that one stylistic nitpick

@hareetd
Copy link

hareetd commented Feb 15, 2022

I was trying to test out the PR yesterday but ran into a couple issues.

During crc start (after having run crc setup), I get the repeated message Operator machine-config is degraded. After 10 min. I get an error saying "cluster still not stable after 10 min" (paraphrased) after which the cluster progresses to the login stage. I'm able to deploy the Operator as well as the Cryostat instance but after some time I'm unable to communicate with the server using oc and can't access the web console, even after having originally logged in as kubeadmin. (Although the web console did mention something about kubeadmin being a temporary login so that might actually be unrelated to the degraded machine-config operator). I'll spin up CRC after the team meeting and share more detailed logs.

Secondly, it's been a while since I tested an -operator PR but in order to bring in your changes regarding Conditions, it's as simple as checking out the PR branch and deploying the Operator as usual? "Usual" means either using the bundle format or through manual install? For some reason, during the time CRC was working and I could interact properly with the cluster, my web console wasn't showing the Conditions you've added in the PR, even though I created a Cryostat CRD with reports enabled. Could you share the exact steps I should be following to properly deploy an Operator with the changes included? Again, I can share pictures after the team meeting.

@hareetd
Copy link

hareetd commented Feb 15, 2022

Logs:

Degraded machine-config operator

Operator deployment workflow ending in a server error (400 status)

When attempting crc console with the above error:

image

The errors can probably be resolved by reinstalling my CRC/Openshift setup but unfortunately, I wasn't able to get a screenshot of my Operator Conditions before the server error occurred. However, I'm assuming my web console will still show the incorrect Conditions like it did yesterday. Could you let me know where I'm going wrong?

@ebaron
Copy link
Member Author

ebaron commented Feb 15, 2022

I was trying to test out the PR yesterday but ran into a couple issues.

During crc start (after having run crc setup), I get the repeated message Operator machine-config is degraded. After 10 min. I get an error saying "cluster still not stable after 10 min" (paraphrased) after which the cluster progresses to the login stage. I'm able to deploy the Operator as well as the Cryostat instance but after some time I'm unable to communicate with the server using oc and can't access the web console, even after having originally logged in as kubeadmin. (Although the web console did mention something about kubeadmin being a temporary login so that might actually be unrelated to the degraded machine-config operator). I'll spin up CRC after the team meeting and share more detailed logs.

Thanks for mentioning this. I thought I had filed an issue for this problem but apparently not. I've done so now: #341. On CRC, you'll see some downtime in the Console and with OC while the oauth-openshift pod starts up. We'll have to see if there's anything we can do to improve this.

Secondly, it's been a while since I tested an -operator PR but in order to bring in your changes regarding Conditions, it's as simple as checking out the PR branch and deploying the Operator as usual? "Usual" means either using the bundle format or through manual install? For some reason, during the time CRC was working and I could interact properly with the cluster, my web console wasn't showing the Conditions you've added in the PR, even though I created a Cryostat CRD with reports enabled. Could you share the exact steps I should be following to properly deploy an Operator with the changes included? Again, I can share pictures after the team meeting.

Sure, I've pushed a bundle image you can use to easily test this PR.
make deploy_bundle BUNDLE_IMG=quay.io/ebaron/cryostat-operator-bundle:conditions-01
Once this succeeds you can visit the "Installed Operators" page in the OpenShift Console. Create a Cryostat CR there and view its details. The Conditions should appear at the bottom of the page as seen in screenshots above.

Copy link
Contributor

@jan-law jan-law left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditions look good when I tested it with cluster-bot

@hareetd
Copy link

hareetd commented Feb 15, 2022

Thanks for mentioning this. I thought I had filed an issue for this problem but apparently not. I've done so now: #341. On CRC, you'll see some downtime in the Console and with OC while the oauth-openshift pod starts up. We'll have to see if there's anything we can do to improve this.

Ah, okay that makes sense then. I kept restarting CRC but this time I waited and was prompted to re-login in order to obtain my OAuth API token.

Sure, I've pushed a bundle image you can use to easily test this PR. make deploy_bundle BUNDLE_IMG=quay.io/ebaron/cryostat-operator-bundle:conditions-01 Once this succeeds you can visit the "Installed Operators" page in the OpenShift Console. Create a Cryostat CR there and view its details. The Conditions should appear at the bottom of the page as seen in screenshots above.

Thanks for pushing the image for me. Out of curiosity, what steps did you follow to produce a bundle image out of your local file changes? Was it make bundle-build, the result of which you pushed to quay.io? I think the issue in my workflow was that I was deploying the 2.1.0-dev bundle without your changes.

@ebaron
Copy link
Member Author

ebaron commented Feb 15, 2022

Thanks for pushing the image for me. Out of curiosity, what steps did you follow to produce a bundle image out of your local file changes? Was it make bundle-build, the result of which you pushed to quay.io? I think the issue in my workflow was that I was deploying the 2.1.0-dev bundle without your changes.

Good question. Here are the steps I took:

make oci-build IMG=quay.io/ebaron/cryostat-operator:conditions-01
podman push quay.io/ebaron/cryostat-operator:conditions-01
make bundle IMG=quay.io/ebaron/cryostat-operator:conditions-01
make bundle-build BUNDLE_IMG=quay.io/ebaron/cryostat-operator-bundle:conditions-01
podman push quay.io/ebaron/cryostat-operator-bundle:conditions-01

Copy link

@hareetd hareetd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is working now and the Conditions look good.

@ebaron ebaron merged commit e85ec83 into cryostatio:main Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Communicate CRD status with conditions
4 participants