Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRD watcher needs to handle out of order resource versions #609

Closed
klarose opened this issue Aug 23, 2018 · 20 comments
Closed

CRD watcher needs to handle out of order resource versions #609

klarose opened this issue Aug 23, 2018 · 20 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@klarose
Copy link

klarose commented Aug 23, 2018

I recently started running into an issue where my controller, which was watching CRDs, would fail with a 410 from its watch on the list function.

After digging into it a bit, I noticed that the version I was getting back at the beginning was not in order. Then, after some time, it would fail with a 410 error code.

After poking around a bit, it sound like out of order events are possibly expected?

In the following example, I have three instances of the resource, let's call them "A", "B" and "C". In order, their resource versions are: 835745, 797550 and 746039. This is the order they are provided by the list. Of interest, I think, is the fact that they appear in alphabetical order, despite not being created in that order.

Here are some logs from my application highlighting the issue. Here, I have "solved" the problem by extracting the "new" resource version from the error code, then updating the watcher's internal resource version to that value.

2018-08-23 18:55:18,166 controller Starting watch with resource version: 
2018-08-23 18:55:18,179 controller event ADDED for serverlists
2018-08-23 18:55:18,180 controller new version for serverlists: 835745
2018-08-23 18:55:18,181 controller event ADDED for serverlists
2018-08-23 18:55:18,181 controller new version for serverlists: 797550
2018-08-23 18:55:18,227 controller event ADDED for serverlists
2018-08-23 18:55:18,227 controller new version for serverlists: 746039
2018-08-23 18:57:03,055 controller event ERROR for serverlists
2018-08-23 18:57:03,055 controller Updating resource version to 826302 due to 'too old' error: {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 746039 (826302)', 'reason': 'Gone', 'code': 410}.
2018-08-23 18:57:03,055 controller Was: 746039
2018-08-23 18:57:03,070 controller event MODIFIED for serverlists
2018-08-23 18:57:03,070 controller new version for serverlists: 826303
2018-08-23 18:57:03,075 controller event MODIFIED for serverlists
2018-08-23 18:57:03,075 controller new version for serverlists: 826504
2018-08-23 18:57:03,078 controller event MODIFIED for serverlists
2018-08-23 18:57:03,079 controller new version for serverlists: 826888
2018-08-23 18:57:03,085 controller event MODIFIED for serverlists
2018-08-23 18:57:03,085 controller new version for serverlists: 827030
2018-08-23 18:57:03,089 controller event MODIFIED for serverlists
2018-08-23 18:57:03,090 controller new version for serverlists: 827214
2018-08-23 18:57:03,098 controller event MODIFIED for serverlists
2018-08-23 18:57:03,098 controller new version for serverlists: 827577
2018-08-23 18:57:03,102 controller event MODIFIED for serverlists
2018-08-23 18:57:03,102 controller new version for serverlists: 833563
2018-08-23 18:57:03,105 controller event MODIFIED for serverlists
2018-08-23 18:57:03,105 controller new version for serverlists: 833579
2018-08-23 18:57:03,118 controller event MODIFIED for serverlists
2018-08-23 18:57:03,118 controller new version for serverlists: 833745
2018-08-23 18:57:03,120 controller event MODIFIED for serverlists
2018-08-23 18:57:03,120 controller new version for serverlists: 835329
2018-08-23 18:57:03,121 controller event MODIFIED for serverlists
2018-08-23 18:57:03,122 controller new version for serverlists: 835474
2018-08-23 18:57:03,124 controller event MODIFIED for serverlists
2018-08-23 18:57:03,126 controller new version for serverlists: 835587
2018-08-23 18:57:03,131 controller event MODIFIED for serverlists
2018-08-23 18:57:03,132 controller new version for serverlists: 835679
2018-08-23 18:57:03,136 controller event MODIFIED for serverlists
2018-08-23 18:57:03,137 controller new version for serverlists: 835745

kubernetes-client/python-base#64 recently fixed the problem of the CRDs not updating the internal resourceVersion. However, should it be doing taking the max of the old and new resourceVersion to ensure things don't get messed up?

Ultimately there are a few things that are sub-par about the current state of affairs:

  1. I don't like having to mess around with the internals of the watcher. It should handle details like resources being out of order internally. Without messing around with its resource version, I get stuck in an infinite loop of errors.
  2. The events that come "later" are a replaying of all the edits I have made to various of the resources that the API server still knows about. That is kinda of nasty, since really I just want to see the most up to date version + any changes from once I've started the watch.

Regarding 1, I think that's definitely a bug in the watcher. Regarding # 2, is this an unreasonable expectation -- that is, should I be handling cases like this in my application?

I'm running my application on Ubuntu 18.04.

Python version: 3.6.5
python kubernetes client version: 7.0.0
Kubernetes server version: 1.11.1

Thanks,

Kyle

@roycaihw roycaihw self-assigned this Oct 12, 2018
@roycaihw
Copy link
Member

Sorry I didn't quite understand the expectation here. From what I understand, the resource versions returned by kube-apiserver should not be interpreted as in alphabetical order, because the RV is an etcd feature and also an implementation detail that is subject to future change. Clients should not try to figure out order between events based on RV.

Could you try running kubectl --watch with some verbose level to see kubectl's behavior (the received RVs)? The error message from apiserver looks weird to me

'status': 'Failure', 'message': 'too old resource version: 746039 (826302)'

I'm curious if some timeout happens and the watch gets restarted.

@tomplus
Copy link
Member

tomplus commented Oct 24, 2018

I had the same issue. I checked kubectl and it works in this way that it starts with fetching a list of objects, the list in the response contains metadata with resourceVersion and this resourceVersion is used to start watching. Documentation is not clear what will response if we start from 0. I found similar issue: kubernetes/kubernetes#49745 (comment) which suggests that it's expected behavior. Should python-client work like kubect?

@klarose
Copy link
Author

klarose commented Oct 24, 2018

@roycaihw I don't think the server necessarily needs to handle this. Rather, I think the client can deal with it gracefully.

The issue comes down to the fact that we are tracking a single variable built incorrectly from many others. Each instance of the resource -- e.g. a pod -- has its own resourceVersion. But, since we're only tracking a single variable, we need to aggregate these many resource versions down into one.

Currently the code does this by taking the most recently seen resourceVersion. However, since the server does not guarantee it will send to us an update from an instance with the latest resource version, we end up with a resource version that is too old. If we're unlucky, that resource version is so old it is no longer valid, and we get the error message.

Another way of thinking of this is that we have some number of independent streams of resources to watch -- say N. For example, imagine I create N pods. Those N pods will each have their own sequence of updates that we can watch. When we watch all of them, we're not watching a single stream of updates; we're watching a stream built by interweaving the updates to all N streams.

Consider the following example. We have three pods, A, B and C.

image

A is created with resource version 1, then later updated with resource version 5. B is 2 then 4. C is 3.

This is the state of the resources when we start watching. So, we should get A @ 5, B @ 4 and C @ 3.

If the order of updates is C @ 3, then B @ 4 then A @ 5, the resourceVersion will be 5. However, if the order is A @ 5, then B @ 4 then C @ 3, the resource version will be 3, which leads to the bug. This is the problem I am seeing.

I think the fix is as simple as ensuring that the client python code only uses the "maximum" version of the resource when updating its cache. I.e. if the most recent resourceVersion is less than the cached one, don't update the cache.

It may not be so simple -- if there is a good reason for resourceVersions to not be monotonically increasing (integer overflow?), then we'll need another strategy. That's why I haven't submitted a PR myself. I don't understand the k8s machinery enough to know whether my "fix" is the right one.

@max-rocket-internet
Copy link

When using this code:

resource_version = ''
while True:
    stream = watch.Watch().stream(crds.list_cluster_custom_object, 'xxxx', 'v1', 'xxx', resource_version=resource_version)
    try:
        for event in stream:
            obj = event["object"]
            metadata = obj.get('metadata')
            spec = obj.get('spec')

            if not metadata or not spec:
                logger.error('No metadata or spec in object, skipping: {0}'.format(json.dumps(obj, indent=1)))
                resource_version = ''
                break

I occasionally have this error:

{"time":"2018-11-13 12:35:03,607", "level":"ERROR", "message":"No metadata or spec in object, skipping: {
 "kind": "Status",
 "apiVersion": "v1",
 "metadata": {},
 "status": "Failure",
 "message": "too old resource version: 4535234 (4535251)",
 "reason": "Gone",
 "code": 410
}"}

Is this error related to this issue? Should I handle it differently?

@klarose
Copy link
Author

klarose commented Nov 13, 2018

It looks related to me. The "Error" message doesn't have the object type you'd expect associated with it, so nothing's filled in.

I handled it by parsing out the version in the parenthesis (4535251 in your example), and setting the watcher's current resource version to that. It's not a very elegant solution, but it has worked for the most part. That said, for the reasons I gave above, I'd like it to be fixed in the client library itself.

    def parse_too_old_failure(self, message):
        regex = r"too old resource version: .* \((.*)\)"
        result = re.search(regex, message)
        if result == None:
            return None

        match = result.group(1)
        if match == None:
            return None

        try:
            return int(match)
        except:
            return None

    def handle_error(self, event):
        try:
            obj = event["raw_object"]
            code = obj.get("code")

            if code == 410:
                new_version = self.parse_too_old_failure(obj.get("message"))
                if new_version != None:
                    if self.log:
                        self.logger.warn("Updating resource version to %u due to 'too old' error: %s." % (new_version, obj))
                        self.logger.warn("Was: %s" % self.watcher.resource_version)
                    self.resource_version = new_version
                    self.watcher.resource_version = self.resource_version

I call handle_error if event["type"] is "error"

@max-rocket-internet
Copy link

Ahh nice. Thanks @klarose !

Yes, think this should definitely be handled in the python client. How does the go client handle this? Or other projects that use the go client?

@juliantaylor
Copy link
Contributor

juliantaylor commented Dec 11, 2018

The code should not use the maximum of the resource version of each object it has seen, it should use the resource version of the top level listing object that was used to return the objects, that contains a resource version too and guarantees that the all objects with updates have been returned up to the resource version of the listing object.

See the method used in #700 which is correct when the python library does not interfere and messes the resource versions up.

@juliantaylor
Copy link
Contributor

nevermind, I had a more detailed look and it seems kubernetes does not actually give you the list object when you are watching even when you receive multiple objects, so you have no single resource version you should base your next watch on.
Using a maximum is probably appropriate then.

@klarose
Copy link
Author

klarose commented Dec 11, 2018

@juliantaylor
I think my posts in #693 actually provide a fairly nice solution. You start everything off with a list, to get the set of resources and the highest version of the list possible, then start watching from there. I think from then on you'll actually get the resources in order, so you won't need to do the max.

edit: I haven't tested this, so I may be wrong, but it seems reasonable. :)

@juliantaylor
Copy link
Contributor

yes that is a good way to implement a watch using the api directly. But I do not know if it is guaranteed by kubernetes to return events in a watch in order of the resource version and the current code assumes that it does.
Additionally the list then watch approach does currently not work as the watch will always start at resource version zero as it ignores user passed in resource versions after the first reset (that is issue #693 and #700).
If we know the ordering guarantees of watches in kubernetes or the overflow behavior the issue is quite easy to fix.

@juliantaylor
Copy link
Contributor

fwiw etcd which kubernetes api is based on says this:
https://coreos.com/etcd/docs/latest/learning/api_guarantees.html

etcd does not ensure linearizability for watch operations. Users are expected to verify the revision of watch responses to ensure correct ordering.

@juliantaylor
Copy link
Contributor

I cannot find any information on the overflow behavior, but the etcd code structures appear to be int64 so it can overflow (e.g. https://github.com/etcd-io/etcd/blob/master/Documentation/dev-guide/api_reference_v3.md).
Not sure if that can be handled reasonably in the library.

@klarose
Copy link
Author

klarose commented Dec 11, 2018

I'd hope k8s (and maybe even etcd...) would handle that for us, since I expect it'd mess up its internals. That said, 2^63 is a pretty big number. My rough calculations suggest that overflow would occur in close to 300K years with 1 million operations per second. Can the k8s API even handle close to that?

2^63 ~= 1e19
1e19/1e6  = 1e13
1e13/4e3 = 2.5e9 (how many hours we get if we do 1M operations per second)
2.5e9/(24 * 365) = ~280K

@juliantaylor
Copy link
Contributor

I guess one can ignore it, I'll open a PR for fixing my issue (#700) not sure if it would fix yours. I have not yet seen out of order events.

@juliantaylor
Copy link
Contributor

fwiw here is a proof of concept branch which does reordering of potential unordered events https://github.com/juliantaylor/python-base/tree/unordered-events
not posted as a PR as I am not certain yet that it is necessary

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 27, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cben
Copy link

cben commented Jan 6, 2020

ref: #819 is newer issue for I believe same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants