-
Notifications
You must be signed in to change notification settings - Fork 105
built-in execution of graphite requests #575
Conversation
f407997
to
c5cc8bc
Compare
92a8ffb
to
7264244
Compare
OK I think I got everything working properly now. supported functions are alias, sumSeries/sum and averageSeries/avg. the Notes:
TODO:
|
now did some performance testing as well. see the script in the last commit.
here's the corresponding snapshot https://snapshot.raintank.io/dashboard/snapshot/Jh5Zl91d5gCIYXoUtsCQ1Owu6mW0C0As |
mostly just parsing of expressions
* track the query pattern used along with a req, so tie back the results to the query later. * series also needs to keep track of the query that caused the series to be created, so that the processing functions can efficiently retrieve the series for the given input parameters * adjust merging so that we only merge series if they correspond to the same request from the same functions.
similar to the change we made to graphite-raintank, sometimes (especially in dev) it's nice to run MT such that it always assumes orgId 1, without requiring something like tsdbgw in front to do auth.
requests we can now successfully use MT directly from browser, with dynamic proxying!
* this way all data gets saved back to the buffer pool, and only once. whether the data was in the input data, whether the data made it through to the end (e.g. individual inputs to a sum generally don't unless there's only 1 input), or whether the data was generated by a processing function * in the future we'll be able to use this as temporary result cache so that steps with some shared logic don't need to redo the processing.
out = mergeSeries(out) | ||
|
||
// instead of waiting for all data to come in and then start processing everything, we could consider starting processing earlier, at the risk of doing needless work | ||
// if we need to cancel the request due to a fetch error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming that there shouldn't be too many fetch errors i'd say that would probably make sense. but i don't think it should be a blocker for merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. but that would complicate things. building a processing system that is pipelined. more of a long term / low prio item IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
api/graphite_proxy.go
Outdated
graphiteProxy.ModifyResponse = func(resp *http.Response) error { | ||
// if kept, would be duplicated. and duplicated headers are illegal) | ||
resp.Header.Del("access-control-allow-credentials") | ||
resp.Header.Del("Access-Control-Allow-Origin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks a little confusing that the header on :43
is with lower case and :44
is with uppercase letters. .Del()
should be case insensitive anyway
func (p Plan) Clean() { | ||
for _, series := range p.input { | ||
for _, serie := range series { | ||
pointSlicePool.Put(serie.Datapoints[:0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand that [:0]
. Isn't that always just creating an empty list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it creates an empty list by creating a slice value, backed by (using a pointer to) the same underlying array. so when we pull the value out of the pool next time, we have a slice of len zero that we can immediately start adding elements to without allocations, because we'll reuse the same backing array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw i think we'll have to switch to putting pointers to slices in the pool. see https://github.com/dominikh/go-tools/tree/master/cmd/staticcheck#sa9000--storing-non-pointer-values-in-syncpool-allocates-memory
"gopkg.in/raintank/schema.v1" | ||
) | ||
|
||
const defaultPointSliceSize = 2000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of the pool is very nice, but i think the limit of 2000
seems a little arbitrary. Would it make sense to make that based on some setting like f.e. maxPointsPerReqHard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxPointsPerReqHard is per request. but a pointslice is a slice of points for a given series.
you're right that it's somewhat arbitrary. you want it low enough to not waste too much memory allocating slices with space that we don't use, and high enough to avoid expensive re-allocation and copying as we need to expand a slice as it gets filled; though the cost of this gets amortized as we can reuse the slice (incl backing array) over time so starting out a little low is not so problematic.
it's hard to generalize how many points will typically be returned for any given series, across different requests, but I think this is a good enough value that we don't have to worry about for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we currently keep meters of the number of series per request and number of points per request (reqRenderSeriesCount, reqRenderPointsFetched). I think we should also keep counters of pointsFetched and number of series fetched.
the current metrics give us good insight into query complexity, but the counters would provide a good measure of overall request workload.
api/middleware/capturebody.go
Outdated
) | ||
|
||
func CaptureBody(c *Context) { | ||
body, _ := ioutil.ReadAll(c.Req.Request.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a potential error should be handled somehow so we don't pass an "invalid" body into the request handlers
docker/docker-cluster/metrictank.ini
Outdated
@@ -132,6 +132,10 @@ key-file = /etc/ssl/private/ssl-cert-snakeoil.key | |||
max-points-per-req-soft = 1000000 | |||
# limit of number of datapoints a request can return. Requests that exceed this limit will be rejected. (0 disables limit) | |||
max-points-per-req-hard = 20000000 | |||
# require x-org-id authentication to auth as a specific org. otherwise orgId 1 is assumed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a )
without (
expr/expr.go
Outdated
case etString: | ||
return fmt.Sprintf("%sexpr-string %q", space, e.valStr) | ||
} | ||
return "HUH-SHOULD-NEVER-HAPPEN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, "this is not a bug, it's a feature"
Datapoints: out, | ||
Interval: series[0].Interval, | ||
} | ||
cache[Req{}] = append(cache[Req{}], output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea to cache that here is cool. but i can't see where this cache is read, am i missing something?
the only place where i can see this is accessed is in Plan.Clean()
, but there it just extracts empty slices from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, maybe the variable should rather be named something like pool
instead of cache
and actually the output
elements are only appended to it to reuse the allocated memory later, but not to reuse the generated values, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the comment for the generated
property of type Plan
, which is what is passed in here. you're right that right now we don't have caching implemented yet, but i do foresee this to be the main use in the future (as explained in that comment).
expr/funcs.go
Outdated
) | ||
|
||
type Func interface { | ||
Signature() ([]argType, []argType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are their two argType slices? one for input, one for output?
expr/funcs.go
Outdated
Signature() ([]argType, []argType) | ||
// what can be assumed to have been pre-validated: len of args, and basic types (e.g. seriesList) | ||
Init([]*expr) error // initialize and validate arguments (expressions given by user), for functions that have specific requirements | ||
Depends(from, to uint32) (uint32, uint32) // allows a func to express its dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are two uint32 numbers a dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't find a better way to phrase it, but the idea is that an invocation like movingAverage(foo, "5min")
with from=x and to=y can declare it "depends" on data from=x-5min to y as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so shouldnt the function just be called TsRange(oringalFrom, originalTo uint32) (uint32, uint32)
expr/funcs.go
Outdated
// what can be assumed to have been pre-validated: len of args, and basic types (e.g. seriesList) | ||
Init([]*expr) error // initialize and validate arguments (expressions given by user), for functions that have specific requirements | ||
Depends(from, to uint32) (uint32, uint32) // allows a func to express its dependencies | ||
Exec(map[Req][]models.Series, ...interface{}) ([]interface{}, error) // execute the function with its arguments. functions and names resolved to series, other expression types(constant, strings, etc) are still the bare expressions that were also passed into Init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this function supposed to return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll push a commit that explains all these functions better
update re graphite perf trouble, for those who are interested:
UPDATE: same requests (with ending wildcard removed), against MT gives:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at the commits that you pushed after my bunch of comments. That all looks good, but i still added one comment
api/graphite.go
Outdated
} | ||
for _, def := range metric.Defs { | ||
locatedDefs[r][def.Id] = locatedDef{def, s.Node} | ||
for _, archive := range metric.Defs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you not need to skip the non-leaf nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really . leaf property is just a shorthand for len(metrics.Def) == 0
introduced via 929513c / 174 became obsolete in b3d3831 / #575 / 0.7.1-61-gbe64803e we no longer really log requests ourselves anymore. i imagine at some point we probably will, at which point we can re-introduce this setting (our qa tool `unused` notified that this variable wasn't being read anywhere)
introduced via 929513c / #174 became obsolete in b3d3831 / #575 / 0.7.1-61-gbe64803e we no longer really log requests ourselves anymore. i imagine at some point we probably will, at which point we can re-introduce this setting (our qa tool `unused` notified that this variable wasn't being read anywhere)
this is a prototype. the basics seem to work, but needs more testing and more polish. I also have yet to seriously start on adding a couple functions.
in particular, the parsing and expr tree seems good, it responds with correct looking output for basic (non-function) requests, and the proxying to graphite-api seems to work too. but: