CRD watcher needs to handle out of order resource versions #609

klarose · 2018-08-23T19:24:22Z

I recently started running into an issue where my controller, which was watching CRDs, would fail with a 410 from its watch on the list function.

After digging into it a bit, I noticed that the version I was getting back at the beginning was not in order. Then, after some time, it would fail with a 410 error code.

After poking around a bit, it sound like out of order events are possibly expected?

In the following example, I have three instances of the resource, let's call them "A", "B" and "C". In order, their resource versions are: 835745, 797550 and 746039. This is the order they are provided by the list. Of interest, I think, is the fact that they appear in alphabetical order, despite not being created in that order.

Here are some logs from my application highlighting the issue. Here, I have "solved" the problem by extracting the "new" resource version from the error code, then updating the watcher's internal resource version to that value.

2018-08-23 18:55:18,166 controller Starting watch with resource version: 
2018-08-23 18:55:18,179 controller event ADDED for serverlists
2018-08-23 18:55:18,180 controller new version for serverlists: 835745
2018-08-23 18:55:18,181 controller event ADDED for serverlists
2018-08-23 18:55:18,181 controller new version for serverlists: 797550
2018-08-23 18:55:18,227 controller event ADDED for serverlists
2018-08-23 18:55:18,227 controller new version for serverlists: 746039
2018-08-23 18:57:03,055 controller event ERROR for serverlists
2018-08-23 18:57:03,055 controller Updating resource version to 826302 due to 'too old' error: {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 746039 (826302)', 'reason': 'Gone', 'code': 410}.
2018-08-23 18:57:03,055 controller Was: 746039
2018-08-23 18:57:03,070 controller event MODIFIED for serverlists
2018-08-23 18:57:03,070 controller new version for serverlists: 826303
2018-08-23 18:57:03,075 controller event MODIFIED for serverlists
2018-08-23 18:57:03,075 controller new version for serverlists: 826504
2018-08-23 18:57:03,078 controller event MODIFIED for serverlists
2018-08-23 18:57:03,079 controller new version for serverlists: 826888
2018-08-23 18:57:03,085 controller event MODIFIED for serverlists
2018-08-23 18:57:03,085 controller new version for serverlists: 827030
2018-08-23 18:57:03,089 controller event MODIFIED for serverlists
2018-08-23 18:57:03,090 controller new version for serverlists: 827214
2018-08-23 18:57:03,098 controller event MODIFIED for serverlists
2018-08-23 18:57:03,098 controller new version for serverlists: 827577
2018-08-23 18:57:03,102 controller event MODIFIED for serverlists
2018-08-23 18:57:03,102 controller new version for serverlists: 833563
2018-08-23 18:57:03,105 controller event MODIFIED for serverlists
2018-08-23 18:57:03,105 controller new version for serverlists: 833579
2018-08-23 18:57:03,118 controller event MODIFIED for serverlists
2018-08-23 18:57:03,118 controller new version for serverlists: 833745
2018-08-23 18:57:03,120 controller event MODIFIED for serverlists
2018-08-23 18:57:03,120 controller new version for serverlists: 835329
2018-08-23 18:57:03,121 controller event MODIFIED for serverlists
2018-08-23 18:57:03,122 controller new version for serverlists: 835474
2018-08-23 18:57:03,124 controller event MODIFIED for serverlists
2018-08-23 18:57:03,126 controller new version for serverlists: 835587
2018-08-23 18:57:03,131 controller event MODIFIED for serverlists
2018-08-23 18:57:03,132 controller new version for serverlists: 835679
2018-08-23 18:57:03,136 controller event MODIFIED for serverlists
2018-08-23 18:57:03,137 controller new version for serverlists: 835745

kubernetes-client/python-base#64 recently fixed the problem of the CRDs not updating the internal resourceVersion. However, should it be doing taking the max of the old and new resourceVersion to ensure things don't get messed up?

Ultimately there are a few things that are sub-par about the current state of affairs:

I don't like having to mess around with the internals of the watcher. It should handle details like resources being out of order internally. Without messing around with its resource version, I get stuck in an infinite loop of errors.
The events that come "later" are a replaying of all the edits I have made to various of the resources that the API server still knows about. That is kinda of nasty, since really I just want to see the most up to date version + any changes from once I've started the watch.

Regarding 1, I think that's definitely a bug in the watcher. Regarding # 2, is this an unreasonable expectation -- that is, should I be handling cases like this in my application?

I'm running my application on Ubuntu 18.04.

Python version: 3.6.5
python kubernetes client version: 7.0.0
Kubernetes server version: 1.11.1

Thanks,

Kyle

The text was updated successfully, but these errors were encountered:

roycaihw · 2018-10-12T23:54:14Z

Sorry I didn't quite understand the expectation here. From what I understand, the resource versions returned by kube-apiserver should not be interpreted as in alphabetical order, because the RV is an etcd feature and also an implementation detail that is subject to future change. Clients should not try to figure out order between events based on RV.

Could you try running kubectl --watch with some verbose level to see kubectl's behavior (the received RVs)? The error message from apiserver looks weird to me

'status': 'Failure', 'message': 'too old resource version: 746039 (826302)'

I'm curious if some timeout happens and the watch gets restarted.

tomplus · 2018-10-24T13:30:28Z

I had the same issue. I checked kubectl and it works in this way that it starts with fetching a list of objects, the list in the response contains metadata with resourceVersion and this resourceVersion is used to start watching. Documentation is not clear what will response if we start from 0. I found similar issue: kubernetes/kubernetes#49745 (comment) which suggests that it's expected behavior. Should python-client work like kubect?

klarose · 2018-10-24T14:03:33Z

@roycaihw I don't think the server necessarily needs to handle this. Rather, I think the client can deal with it gracefully.

The issue comes down to the fact that we are tracking a single variable built incorrectly from many others. Each instance of the resource -- e.g. a pod -- has its own resourceVersion. But, since we're only tracking a single variable, we need to aggregate these many resource versions down into one.

Currently the code does this by taking the most recently seen resourceVersion. However, since the server does not guarantee it will send to us an update from an instance with the latest resource version, we end up with a resource version that is too old. If we're unlucky, that resource version is so old it is no longer valid, and we get the error message.

Another way of thinking of this is that we have some number of independent streams of resources to watch -- say N. For example, imagine I create N pods. Those N pods will each have their own sequence of updates that we can watch. When we watch all of them, we're not watching a single stream of updates; we're watching a stream built by interweaving the updates to all N streams.

Consider the following example. We have three pods, A, B and C.

A is created with resource version 1, then later updated with resource version 5. B is 2 then 4. C is 3.

This is the state of the resources when we start watching. So, we should get A @ 5, B @ 4 and C @ 3.

If the order of updates is C @ 3, then B @ 4 then A @ 5, the resourceVersion will be 5. However, if the order is A @ 5, then B @ 4 then C @ 3, the resource version will be 3, which leads to the bug. This is the problem I am seeing.

I think the fix is as simple as ensuring that the client python code only uses the "maximum" version of the resource when updating its cache. I.e. if the most recent resourceVersion is less than the cached one, don't update the cache.

It may not be so simple -- if there is a good reason for resourceVersions to not be monotonically increasing (integer overflow?), then we'll need another strategy. That's why I haven't submitted a PR myself. I don't understand the k8s machinery enough to know whether my "fix" is the right one.

max-rocket-internet · 2018-11-13T12:47:53Z

When using this code:

resource_version = ''
while True:
    stream = watch.Watch().stream(crds.list_cluster_custom_object, 'xxxx', 'v1', 'xxx', resource_version=resource_version)
    try:
        for event in stream:
            obj = event["object"]
            metadata = obj.get('metadata')
            spec = obj.get('spec')

            if not metadata or not spec:
                logger.error('No metadata or spec in object, skipping: {0}'.format(json.dumps(obj, indent=1)))
                resource_version = ''
                break

I occasionally have this error:

{"time":"2018-11-13 12:35:03,607", "level":"ERROR", "message":"No metadata or spec in object, skipping: {
 "kind": "Status",
 "apiVersion": "v1",
 "metadata": {},
 "status": "Failure",
 "message": "too old resource version: 4535234 (4535251)",
 "reason": "Gone",
 "code": 410
}"}

Is this error related to this issue? Should I handle it differently?

klarose · 2018-11-13T13:55:40Z

It looks related to me. The "Error" message doesn't have the object type you'd expect associated with it, so nothing's filled in.

I handled it by parsing out the version in the parenthesis (4535251 in your example), and setting the watcher's current resource version to that. It's not a very elegant solution, but it has worked for the most part. That said, for the reasons I gave above, I'd like it to be fixed in the client library itself.

    def parse_too_old_failure(self, message):
        regex = r"too old resource version: .* \((.*)\)"
        result = re.search(regex, message)
        if result == None:
            return None

        match = result.group(1)
        if match == None:
            return None

        try:
            return int(match)
        except:
            return None

    def handle_error(self, event):
        try:
            obj = event["raw_object"]
            code = obj.get("code")

            if code == 410:
                new_version = self.parse_too_old_failure(obj.get("message"))
                if new_version != None:
                    if self.log:
                        self.logger.warn("Updating resource version to %u due to 'too old' error: %s." % (new_version, obj))
                        self.logger.warn("Was: %s" % self.watcher.resource_version)
                    self.resource_version = new_version
                    self.watcher.resource_version = self.resource_version

I call handle_error if event["type"] is "error"

max-rocket-internet · 2018-11-13T14:24:28Z

Ahh nice. Thanks @klarose !

Yes, think this should definitely be handled in the python client. How does the go client handle this? Or other projects that use the go client?

juliantaylor · 2018-12-11T09:10:49Z

The code should not use the maximum of the resource version of each object it has seen, it should use the resource version of the top level listing object that was used to return the objects, that contains a resource version too and guarantees that the all objects with updates have been returned up to the resource version of the listing object.

See the method used in #700 which is correct when the python library does not interfere and messes the resource versions up.

juliantaylor · 2018-12-11T11:38:31Z

nevermind, I had a more detailed look and it seems kubernetes does not actually give you the list object when you are watching even when you receive multiple objects, so you have no single resource version you should base your next watch on.
Using a maximum is probably appropriate then.

klarose · 2018-12-11T13:07:40Z

@juliantaylor
I think my posts in #693 actually provide a fairly nice solution. You start everything off with a list, to get the set of resources and the highest version of the list possible, then start watching from there. I think from then on you'll actually get the resources in order, so you won't need to do the max.

edit: I haven't tested this, so I may be wrong, but it seems reasonable. :)

juliantaylor · 2018-12-11T18:05:35Z

yes that is a good way to implement a watch using the api directly. But I do not know if it is guaranteed by kubernetes to return events in a watch in order of the resource version and the current code assumes that it does.
Additionally the list then watch approach does currently not work as the watch will always start at resource version zero as it ignores user passed in resource versions after the first reset (that is issue #693 and #700).
If we know the ordering guarantees of watches in kubernetes or the overflow behavior the issue is quite easy to fix.

juliantaylor · 2018-12-11T18:08:28Z

fwiw etcd which kubernetes api is based on says this:
https://coreos.com/etcd/docs/latest/learning/api_guarantees.html

etcd does not ensure linearizability for watch operations. Users are expected to verify the revision of watch responses to ensure correct ordering.

juliantaylor · 2018-12-11T18:17:05Z

I cannot find any information on the overflow behavior, but the etcd code structures appear to be int64 so it can overflow (e.g. https://github.com/etcd-io/etcd/blob/master/Documentation/dev-guide/api_reference_v3.md).
Not sure if that can be handled reasonably in the library.

klarose · 2018-12-11T18:33:19Z

I'd hope k8s (and maybe even etcd...) would handle that for us, since I expect it'd mess up its internals. That said, 2^63 is a pretty big number. My rough calculations suggest that overflow would occur in close to 300K years with 1 million operations per second. Can the k8s API even handle close to that?

2^63 ~= 1e19
1e19/1e6  = 1e13
1e13/4e3 = 2.5e9 (how many hours we get if we do 1M operations per second)
2.5e9/(24 * 365) = ~280K

juliantaylor · 2018-12-11T18:40:55Z

I guess one can ignore it, I'll open a PR for fixing my issue (#700) not sure if it would fix yours. I have not yet seen out of order events.

juliantaylor · 2018-12-12T19:13:55Z

fwiw here is a proof of concept branch which does reordering of potential unordered events https://github.com/juliantaylor/python-base/tree/unordered-events
not posted as a PR as I am not certain yet that it is necessary

fejta-bot · 2019-04-27T13:54:00Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-27T14:36:36Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-06-26T15:27:57Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-06-26T15:28:05Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cben · 2020-01-06T13:38:06Z

ref: #819 is newer issue for I believe same thing.

roycaihw self-assigned this Oct 12, 2018

max-rocket-internet mentioned this issue Nov 28, 2018

CRD watch stream starts processing very old deleted resources #693

Closed

max-rocket-internet mentioned this issue Jan 15, 2019

"resourceVersion now: 13748114" and processing old versions of CRD resources max-rocket-internet/newrelic-controller#4

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 27, 2019

k8s-ci-robot closed this as completed Jun 26, 2019

juliantaylor mentioned this issue Jan 6, 2020

Old events from the past yielded due to remembered resource_version #819

Closed

brendandburns mentioned this issue Oct 3, 2020

Handling ResourceVersion wrap around kubernetes-client/javascript#516

Closed

ephraimbuddy mentioned this issue Apr 18, 2021

Parse Error 410 in kubernetes Watcher and return latest resource version apache/airflow#15418

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRD watcher needs to handle out of order resource versions #609

CRD watcher needs to handle out of order resource versions #609

klarose commented Aug 23, 2018 •

edited

Loading

roycaihw commented Oct 12, 2018

tomplus commented Oct 24, 2018

klarose commented Oct 24, 2018 •

edited

Loading

max-rocket-internet commented Nov 13, 2018

klarose commented Nov 13, 2018 •

edited

Loading

max-rocket-internet commented Nov 13, 2018

juliantaylor commented Dec 11, 2018 •

edited

Loading

juliantaylor commented Dec 11, 2018

klarose commented Dec 11, 2018 •

edited

Loading

juliantaylor commented Dec 11, 2018

juliantaylor commented Dec 11, 2018

juliantaylor commented Dec 11, 2018

klarose commented Dec 11, 2018

juliantaylor commented Dec 11, 2018

juliantaylor commented Dec 12, 2018

fejta-bot commented Apr 27, 2019

fejta-bot commented May 27, 2019

fejta-bot commented Jun 26, 2019

k8s-ci-robot commented Jun 26, 2019

cben commented Jan 6, 2020 •

edited

Loading

CRD watcher needs to handle out of order resource versions #609

CRD watcher needs to handle out of order resource versions #609

Comments

klarose commented Aug 23, 2018 • edited Loading

roycaihw commented Oct 12, 2018

tomplus commented Oct 24, 2018

klarose commented Oct 24, 2018 • edited Loading

max-rocket-internet commented Nov 13, 2018

klarose commented Nov 13, 2018 • edited Loading

max-rocket-internet commented Nov 13, 2018

juliantaylor commented Dec 11, 2018 • edited Loading

juliantaylor commented Dec 11, 2018

klarose commented Dec 11, 2018 • edited Loading

juliantaylor commented Dec 11, 2018

juliantaylor commented Dec 11, 2018

juliantaylor commented Dec 11, 2018

klarose commented Dec 11, 2018

juliantaylor commented Dec 11, 2018

juliantaylor commented Dec 12, 2018

fejta-bot commented Apr 27, 2019

fejta-bot commented May 27, 2019

fejta-bot commented Jun 26, 2019

k8s-ci-robot commented Jun 26, 2019

cben commented Jan 6, 2020 • edited Loading

klarose commented Aug 23, 2018 •

edited

Loading

klarose commented Oct 24, 2018 •

edited

Loading

klarose commented Nov 13, 2018 •

edited

Loading

juliantaylor commented Dec 11, 2018 •

edited

Loading

klarose commented Dec 11, 2018 •

edited

Loading

cben commented Jan 6, 2020 •

edited

Loading