Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replica set conditions API #33905

Merged
merged 3 commits into from
Oct 12, 2016
Merged

Replica set conditions API #33905

merged 3 commits into from
Oct 12, 2016

Conversation

0xmichalis
Copy link
Contributor

@0xmichalis 0xmichalis commented Oct 3, 2016

Partially addresses #32863

@kubernetes/sig-apps


This change is Reviewable

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Oct 3, 2016
@0xmichalis
Copy link
Contributor Author

Still needs a test

@0xmichalis
Copy link
Contributor Author

And #33092 would be a nice-to-have.

@k8s-ci-robot
Copy link
Contributor

Jenkins Kubemark GCE e2e failed for commit 7a20df8c2c60f2088b5f7ce08255c29d76aae198. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@@ -1753,6 +1753,32 @@ type ReplicationControllerStatus struct {

// ObservedGeneration is the most recent generation observed by the controller.
ObservedGeneration int64 `json:"observedGeneration,omitempty"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API change LGTM - it is consistent with other objects that have conditions.

@kubernetes/api-review-team any disagreement?

Copy link
Contributor

@soltysh soltysh Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other remark about LastProbeTime.

@smarterclayton
Copy link
Contributor

I would recommend splitting the change to actually set conditions to a separate PR, and have this only be the API change to support conditions.

@0xmichalis
Copy link
Contributor Author

I would recommend splitting the change to actually set conditions to a separate PR, and have this only be the API change to support conditions.

Sure, do you want something similar for the perma-failed PR?

@0xmichalis
Copy link
Contributor Author

Updated to include just the API changes

@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit 9b21223e8e0b00527af04003cc83132f13110323. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit 9b21223e8e0b00527af04003cc83132f13110323. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@0xmichalis 0xmichalis changed the title Replica set conditions Replica set conditions API Oct 4, 2016
@smarterclayton
Copy link
Contributor

Thanks for splitting - I have no objections to adding this field to ReplicaSets but needs a sign off to @kubernetes/api-review-team

@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 4, 2016
// Status of the condition, one of True, False, Unknown.
Status ConditionStatus `json:"status"`
// The last time the condition transitioned from one status to another.
LastTransitionTime unversioned.Time `json:"lastTransitionTime,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, there's also LastProbeTime unversioned.Time which is the last time a check was actually performed. See PodCondition or JobCondition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But does LastProbeTime make sense for the types of conditions we have for replica sets? Both Ready and ReplicaFailure conditions are updated only once their status changes. See #19343 (comment) for more info on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment and further discussion doesn't convince me to have just single timestamp. Having both seems more flexible, imho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with both types of timestamps

@@ -1753,6 +1753,32 @@ type ReplicationControllerStatus struct {

// ObservedGeneration is the most recent generation observed by the controller.
ObservedGeneration int64 `json:"observedGeneration,omitempty"`

Copy link
Contributor

@soltysh soltysh Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other remark about LastProbeTime.

const (
// ReplicationControllerReplicaFailure is added in a replication controller when one of its pods
// fails to be created or deleted.
ReplicationControllerReplicaFailure ReplicationControllerConditionType = "ReplicaFailure"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one condition? What about running or similar, at least?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add an additional Ready condition type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced Ready is meaningful for an RC.

@0xmichalis
Copy link
Contributor Author

@kubernetes/kube-api ptal

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit 8ff78aa. Full PR test history.

The magic incantation to run this job again is @k8s-bot gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// ReplicationControllerReplicaFailure is added in a replication controller when one of its pods
// fails to be created or deleted.
ReplicationControllerReplicaFailure ReplicationControllerConditionType = "ReplicaFailure"
// ReplicationControllerReady denotes whether all of the replicas for the controller are ready or not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention that this flips if any replica is not yet ready? That can happen sort of arbitrarily - why is this useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will flip. Actually you got me thinking that this information already exists in the resource printer of a replica set (READY pods). I wanted to make replica sets more usable but this condition doesn't add much. Removing...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree Ready doesn't belong here.

// These are valid conditions of a replication controller.
const (
// ReplicationControllerReplicaFailure is added in a replication controller when one of its pods
// fails to be created or deleted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explain how/why this might happen? E.g. is it just bugs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#32863 (comment)

Eventually I want to surface ImagePull errors and crashlooping pods under this (or maybe a new one) Condition too.

const (
// ReplicationControllerReplicaFailure is added in a replication controller when one of its pods
// fails to be created or deleted.
ReplicationControllerReplicaFailure ReplicationControllerConditionType = "ReplicaFailure"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced Ready is meaningful for an RC.

// fails to be created or deleted.
ReplicaSetReplicaFailure ReplicaSetConditionType = "ReplicaFailure"
// ReplicaSetReady denotes whether all of the replicas for the replica set are ready or not.
ReplicaSetReady ReplicaSetConditionType = "Ready"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions as RC

@@ -28176,6 +28176,39 @@
}
}
},
"v1.ReplicationControllerCondition": {
"description": "ReplicationControllerCondition describes the state of a replication controller at a certain point.",
"required": [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have been confused about the word Condition until i found this PR. I was thinking something like a predicate or cirumstance that led to something. Should we rather call it ReplicationControllerState ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that ship has sailed. The convention is "condition".

noun
1.
the state of something, especially with regard to its appearance, quality, or working order.

"State" is equally ambiguous.

@@ -28232,6 +28265,13 @@
"type": "integer",
"format": "int32"
},
"conditions": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this states

@krmayankk
Copy link

@kubernetes/sig-apps does anyone else feel confused about the use of the word condition, should we rather calle it ReplicaSetState and states ?

@smarterclayton
Copy link
Contributor

smarterclayton commented Oct 9, 2016 via email

@thockin
Copy link
Member

thockin commented Oct 10, 2016

"state" implies a state-machine, and this is designed to avoid a
state-machine. State-machines, as an API abstraction, are very rigid, and
have almost no room for movement. Conditions are orthogonal ideas.

We do not, however, seem to have a guideline for whether all conditions
should exist on all instances at all times, or whether they should only be
present if the logical condition is of a particular polarity (e.g. phrase
conditions as default-false and their presence implies true - "Unready" vs
"Ready".)

On Sun, Oct 9, 2016 at 9:37 AM, Clayton Coleman notifications@github.com
wrote:

Condition is what we use in this context in the API. It's an array of
conditions on the object. See Pod and Node.


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#33905 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVBVhHPrFuvhk-gktw77tNWbaJSWjks5qyRg6gaJpZM4KMcca
.

@soltysh
Copy link
Contributor

soltysh commented Oct 10, 2016

@krmayankk there's #7856 as a background for your doubts.

@0xmichalis
Copy link
Contributor Author

We do not, however, seem to have a guideline for whether all conditions should exist on all instances at all times, or whether they should only be present if the logical condition is of a particular polarity (e.g. phrase conditions as default-false and their presence implies true - "Unready" vs "Ready".)

This is something I came across both here and in Deployment conditions with ReplicaFailure. Do we want to show it up by default to False and transition to True once we have a ReplicaFailure or do we want it to show up only on failures? Should we update api conventions about what is the right approach to authoring Conditions?

@thockin
Copy link
Member

thockin commented Oct 10, 2016

Left to my own devices I would have probably made it an implies true, but
it seems not to be. I don't have enough context on the existing Conditions
to say for sure - @bgrant0607

On Mon, Oct 10, 2016 at 3:29 AM, Michail Kargakis notifications@github.com
wrote:

We do not, however, seem to have a guideline for whether all conditions
should exist on all instances at all times, or whether they should only be
present if the logical condition is of a particular polarity (e.g. phrase
conditions as default-false and their presence implies true - "Unready" vs
"Ready".)

This is something I came across both here and in Deployment conditions
with ReplicaFailure. Do we want to show it up by default to False and
transition to True once we have a ReplicaFailure or do we want it to show
up only on failures? Should we update api conventions about what is the
right approach to authoring Conditions?


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#33905 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVGToSJNzgSH12ffjECphTfGjJFnAks5qyhOhgaJpZM4KMcca
.

@bgrant0607 bgrant0607 added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Oct 10, 2016

// These are valid conditions of a replica set.
const (
// ReplicaSetReplicaFailure is added in a replication controller when one of its pods fails
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/replication controller/replica set/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@bgrant0607
Copy link
Member

@thockin @Kargakis

Initially we started with conditions like Ready, Available, Reachable, etc. -- true always meant "known good". False would assumed by default, though we always populated the conditions.

However, most node conditions we wanted to report were problems, so "known bad" conditions were introduced. These conditions we didn't always populate. Even in this case, false could be assumed by default, however. ReplicaFailed appears to match this convention. I'm fine with it.

I assume that ReplicaFailed would be present when the controller was unable to make current state match desired state.

@bgrant0607
Copy link
Member

ok to test

@bgrant0607
Copy link
Member

@Kargakis Documenting the convention would be useful.

@0xmichalis
Copy link
Contributor Author

I assume that ReplicaFailed would be present when the controller was unable to make current state match desired state.

In the first pass, ReplicaFailure will be added only once a CREATE or DELETE on a pod fails. So the answer is yes. Long-term though I would like to surface conditions like ImagePullBackOff or CrashLoopBackOff which means that the current state (rs.status.replicas) will match the desired state (rs.spec.replicas) but we will still have pod failures. This will be very useful for infant mortality detection: #18568

@0xmichalis
Copy link
Contributor Author

@bgrant0607 @thockin can we move forward with this? I would like to have it in 1.5

@bgrant0607
Copy link
Member

@Kargakis I'm ok with the API change.

Did @soltysh review the implementation?

cc @kubernetes/deployment

@bgrant0607 bgrant0607 assigned soltysh and unassigned smarterclayton Oct 11, 2016
@bgrant0607
Copy link
Member

cc @erictune

@0xmichalis
Copy link
Contributor Author

Did @soltysh review the implementation?

This PR contains only the API changes as per @smarterclayton's request. I will open a follow-up with the implementation as soon as this merges.

@thockin
Copy link
Member

thockin commented Oct 12, 2016

API LGTM

@0xmichalis 0xmichalis added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 12, 2016
@soltysh
Copy link
Contributor

soltysh commented Oct 12, 2016

Did @soltysh review the implementation?

Current yes, further haven't seen yet.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit f9e8ee8 into kubernetes:master Oct 12, 2016
@0xmichalis 0xmichalis deleted the replica-set-conditions branch October 12, 2016 13:21
k8s-github-robot pushed a commit that referenced this pull request Nov 4, 2016
Automatic merge from submit-queue

Replica set conditions controller changes

Follow-up to #33905, partially addresses #32863.

@smarterclayton @soltysh @bgrant0607 @mfojtik I just need to add e2e tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants