-
Notifications
You must be signed in to change notification settings - Fork 105
Conversation
Does this api endpoint need to broadcast the request to all nodes in the cluster? I feel that it should, or at least should have a flag to tell it to. |
Interesting idea. What's the reason why you think that's necessary, because the cluster also forwards the requests between the nodes if something requested is known to be another node? So the cache clearing behavior should be consistent with that? That makes sense |
If running a multinode cluster, and i want to delete the cache for all series that match "some.metric.*" the series that match that pattern could be spread across many nodes. Having the delete request propagate through the whole cluster would be a requirement for exposing this feature to users who would typically have no visibility into the cluster topology, they only have a single gateway address. |
+1 to support propagation. but we should make it optional. I suspect often enough cache clearing will be done by an operator who wants to only do it to 1 instance, to diagnose or try to fix a problem, or the operator may want to do it to different instances at different times, instead of all at once. |
4194765
to
50925b8
Compare
b93a8d1
to
edf7987
Compare
7a60638
to
91675a2
Compare
it seems to make sense to be able to delete the entire cache, currently it can only be done for series matching the pattern, but there is no pattern to mach all series (correct me if i'm wrong), so maybe allow for pattern |
api/ccache.go
Outdated
} | ||
} | ||
} | ||
response.Write(ctx, response.NewJson(200, res, "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the peers reported a bunch of errors, or could not connect to peers, we should probably not report 200 ok
seems like something went wrong with the rebase. it's showing a bunch of changes that seem unrelated to this PR, like moving cluster stuff around. can you amend the commit so that the diff is only what is supposed to change |
91675a2
to
e5149ef
Compare
@Dieterbe I think you're right that it would make sense to add a way to clear the whole cache, but I think using |
|
@Dieterbe because i think that |
8e64d9a
to
c2ee3be
Compare
@Dieterbe is there something more that i need to do to make this mergable? (apart from rebasing once again if the context cancelling get merged) |
66de456
to
0aac486
Compare
@Dieterbe good news, because I'm sure you're looking for more PRs to review, this one would be ready again^^ |
func (cd CCacheDelete) TraceDebug(span opentracing.Span) { | ||
} | ||
|
||
type CCacheDeleteResp struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the number of errors is interesting, but much more useful I think would be the first error encountered (for each node)
73f3ad5
to
13b8874
Compare
bdde822
to
979bea4
Compare
@Dieterbe this would be ready to review again |
f545bc4
to
f54d2ba
Compare
|
||
if respParsed.DeletedSeries != delSeries || respParsed.DeletedArchives != delArchives { | ||
t.Fatalf("Expected %d series and %d archives to get deleted, but got %d and %d", delSeries, delArchives, respParsed.DeletedSeries, respParsed.DeletedArchives) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm confused about why we tell the cache to return these fake, made up numbers and then check them, while we also check the real numbers above. can this be simplified ? for example have the mockcache return that deleted series == deleted metric keys, and deleted archives is maybe pinned to 3x the number of deleted keys? does this idea make sense? if not, add a comment somewhere that explains this please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only testing the request handler, not the cache. It's testing one request with propagation disabled (TestMetricDelete
) and one with propagation enabled (TestMetricDeletePropagation
). I'm not sure how creating additional rules like archives = 3 * series
would simplify things, that would rather just make it more complicated because in order to understand the test a reader would then first need to be aware of this rule.
Maybe I should rename the tests to TestMetricDeleteRequestHandlerWithoutPropagation()
and TestMetricDeleteRequestHandlerWithPropagation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's confusing because the mock cache claims to have deleted a number series and archives that has nothing to do with the delete request we actually issued on it. the numbers don't seem to make sense. we assert that it only received 1 metric key delete , then how could any cache have deleted 3 series if there was only 1 key? i know it's a mock and not a real cache, but the numbers should make more sense I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would clear things up if the DelMetric method took patterns.
but according to the argument names or the interface function and its implementations
(and the docs for CCache.DelMetric) it only takes 1 metric key, not a pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i found a bug while looking at that just now: b168558
return c.thisNode() | ||
} | ||
|
||
func (c *MemberlistManager) thisNode() HTTPNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this private function if we already had the public one that did the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, that's not necessary anymore, was only necessary in some previous version 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mdata/cache/ccache_metric.go
Outdated
@@ -19,6 +19,8 @@ type CCacheMetric struct { | |||
|
|||
// the list of chunk time stamps in ascending order | |||
keys []uint32 | |||
|
|||
RawMetric string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as i have mentioned before, our codebase is very GC-heavy. we need to pay attention to how many pointers we keep active. and a string has a pointer. instead of keeping the entire RawMetric string for each CCacheMetric why not just keep 1 uint8
which holds the number to be subtracted from the key len to substring and get the raw key? it'll go easier on GC and save us memory as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we could also store the len of the raw string in a uint8 but then we're capped to a key len of 255 which some metrics will want to exceed. but a suffix can't be longer than 255 chars, we know that for sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
75b2218
to
a0f35fa
Compare
a0f35fa
to
5f49ca0
Compare
@replay another round of comments and a few small commits. please have a look :) |
521206c
to
0ec10db
Compare
0ec10db
to
dfa58d3
Compare
58d279a
to
b168558
Compare
Adds an API endpoint to clear the cache by metric name, including all the chunks of associated aggregations.
#547