-
Notifications
You must be signed in to change notification settings - Fork 105
Conversation
|
||
if existing.LastUpdate < int64(point.Time) { | ||
existing.LastUpdate = int64(point.Time) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change (and the similar one below) are somewhat orthogonal, but as part of the interval change a rogue datapoint flying in "reset" the LastUpdate time to an older time and made the data unfindable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very interesting. note that the tank (AggMetric) wouldn't add this point (unless it is accepted by the reorder buffer), which makes this case even more interesting.
Also this only seems a problem if this "buggy, old" point is not followed by a correct, recent point.
you're the first one I know with a stream this broken, but sure, we may as well protect against it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This didn't actually happen to us, but it was my leading guess of what happened. Turns out I was wrong. The issue we had was that during a backfill, the LastUpdate
in cassandra doesn't get set correctly (because the data stopped and didn't cause a flush). It's a mega corner case though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, comparing Find with FindByTag and checking for how they differ is an interesting exercise. I found another bug which we can fix later: #899
probably no one has noticed cause no one uses the feature (except worldping which is currently untagged)
I'm missing a lot of info here. we = your deployment? or we = the metrictank codebase? |
Sure. FindByTag will always return exactly one def per As an example, if you had |
idx/memory/memory.go
Outdated
HasChildren: false, | ||
Defs: []idx.Archive{*def}, | ||
}) | ||
if _, ok := seen[def.NameWithTags()]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ids
returned by m.idsByTagQuery have already been filtered to only include the ids of defs that have a lastUpdate > from
So instead of searching for all defs with this NameWithTags, wouldnt it make more sense to just create a map[string]*idx.Node to group all results by the def.NameWithTags().
Series names that have not been seen before just create a new entry in the map, with idx.Node's Defs field set to []idx.Archive{*def}
. If the series name has already been seen then the def just gets appended to idx.Node's Defs field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I had actually implemented it like this originally, but found the defByTagSet
and went that path. I didn't think about this though. Will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree. irrespective of the lastUpdate check (which indeed can be removed) that seems simpler, cleaner and also more symmetrical to what MemoryIdx.Find() does
Now your explanation makes a lot more sense. I agree that we should fix this, and generally we should make FindByTag more symmetrical to Find. |
3a92e54
to
42b19db
Compare
I'm not sure why a test failed here. It doesn't look like it's due to my change, but I cannot retry it. |
@shanson7 this PR looks 👍 i just added a few minor cleanups and will merge shortly. |
2435f03
to
81a95cf
Compare
Because we were not combining multiple intervals for the same series into a single idx.Node we were hitting issues with the code deduping the series on the remote end. This change should fix both cases:
multiple defs on a single node should be collapsed into a single idx.Node element
multiple defs across nodes should not overwrite one another and should be merged later (by mergeSeries)
The symptom was data sort of "flashing" on refresh, switching between the old and new intervals data (since only one was winning).