-
Notifications
You must be signed in to change notification settings - Fork 104
update memory index prune to hold write lock as briefly as possible #787
Conversation
we need to add a benchmark case so we can properly compare before and after this change. i can do this |
|
||
log.Debug("memory-idx: series %s for orgId:%d is stale. pruning it.", n.Path, org) | ||
defs := m.delete(org, n, true) | ||
statMetricsActive.Dec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to this PR, but this looks incorrect. seems like m.delete may delete several paths (children), this assumes 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the if !n.Leaf() {
check on line 912 mean that there should never be children? We could add that same check here in case of a race condition where a child is added to a series that's about to be pruned, but that seems like a highly unlikely scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the if !n.Leaf() { check on line 912 mean that there should never be children?
the only thing that branch does is assure that nodes without metricdefs (data corresponding to that path) are not subject to pruning.
nodes with metricdefs (leaf nodes) are subject to pruning. I think the confusion here is because
in our implementation, a path can be both a leaf and have children.
e.g. foo.bar
can have data/metricdef (making it a leaf), while foo.bar.baz
also exists (making foo.bar a branch)
In graphite this doesn't work, but some of our customers wanted this behavior, and it seemed fairly easy to support, so that's why we do.
the m.delete
call is recursive: it deletes the path, and any sub-paths (I think i actually just discovered a related bug here), so what we should do is track the actual amount of paths that got deleted by the delete call, or I think better: the pruning process shouldn't recursively delete, but rather only delete leaves (and clean up stale branch nodes that no longer have children)
But this warrants a new ticket, so i opened #797
2 comments here:
|
|
|
i changed my mind, don't have enough time and doing a bench is not that useful here as one can obviously tell that the total duration/bandwidth should be in line (possibly a bit worse), but the big gain is obviously the much more granular index locking, which is very hard/unfeasible to convey via a benchmark, and is what we really need. benchmark would be more useful once we make other tweaks (like shortcutting the expensive log calls), but I think the best bang for our buck now is to just merge and deploy this and we will probably not have to look at this in quite a while. |
Compiles and passes all tests, I haven't done any real-world testing though.