Graphitey consolidation part 1 #534

Dieterbe · 2017-02-17T13:02:21Z

No description provided.

By making sure the user has their agg settings in proper order: 1) we make their lives easier because come on seriously 2) we avoid the need to sort in alignRequests 3) alignRequests can print archive numbers (1,2,3,... being the archive in a list of sorted archives) that always match the index of the archive in the config

fix #503

Dieterbe · 2017-02-17T17:20:33Z

@woodsaj or @replay if you have some time, mind taking a look? this covers step 1 of #463 (comment)

replay · 2017-02-18T19:12:54Z

api/query_engine.go

+	}
+	lcm := util.Lcm(keys)
+
+	options := make([]narchive, 0, len(aggSettings.Aggs)+1)


instead of []narchive you could just use narchives

replay · 2017-02-18T19:15:32Z

api/query_engine.go

+// potentially making the "raw interval" > first rollup interval, so that
+// raw archive might be ordered after a rollup, and where the archive
+// itself knows which one it is, instead of having to rely on index lookup table.
+func getOptionsLikeGraphite(reqs []models.Req, aggSettings mdata.AggSettings, tsRange uint32) ([]narchive, map[uint32]struct{}) {


could just use narchives instead of []narchive if the type is already defined anyway

replay · 2017-02-18T19:32:33Z

api/query_engine.go

+	}
+
+	if LogLevel < 2 {
+		options[selected].chosen = true


Even though this saves an operation in prod, I wouldn't modify such an important data structure under a condition that depends on the loglevel even if at the moment it isn't used at any other critical location (not considering narchive.String() critical) because I think it's too easy for somebody to use that .chosen flag for something else without being aware that it's correctness depends on such an arbitrary setting like the log level.

replay · 2017-02-18T19:46:28Z

api/query_engine.go

+// number 0 = raw archive, number 1,2,..N = aggregation 1,2..N
+// this allows you to reorder narchives and still keep track of which is which
+type narchive struct {
+	i          int


Some explanatory name and a more extensive comment that explains why it's necessary to "keep track of which is which" would be helpful

replay

Looks good, with some minor comments

woodsaj · 2017-02-19T16:09:02Z

This seems to be way more complicated then it should be. We are simplifying our code, so there should be more lines of code deleted then new lines being added.

i dont like the idea of supporting 2 query modes. We are building a graphite compatible backend, so there is no need for a "make incompatible" flag. Supporting the 2 modes just complicates the code.
The logic for selecting the archive to use for a query seems convoluted. all we need to do is return the highest resolution archive that has data for the requested time range. As all series have the same aggSettings, and aggSettings.Aggs should already be sorted by data resolution (if we dont re-order when parsing the config, we should)

func (setting AggSettings) ForRequest(from int32) int {
  now := time.Now().Unix()
  if now-setting.RawTTL <= from {
    return 0  // should probably use a constant like mdata.RawArchive
  }
  for i, aggSetting := range setting.Aggs {
    if now-agg.TTL <= from {
       return i+1
   }
  }
  return len(setting.Aggs)
}

Dieterbe · 2017-02-19T17:13:55Z

i dont like the idea of supporting 2 query modes. We are building a graphite compatible backend, so there is no need for a "make incompatible" flag. Supporting the 2 modes just complicates the code.

I think there should be a grace period between introducing the new code and getting rid of the old.
We should only get rid of the old code once we've given the new approach some time to mature (and possibly make adjustments) and prove that it works well for all of our environments.
There's no harm in keeping the old code around for a bit longer, and it gives us more options. (e.g. in case it doesn't work well yet for an environment but we'd like to update MT because we want to roll out other things)

if we dont re-order when parsing the config, we should

no. user should specify them in the right order. it's for their own good (see also the commit msg of 6dc62be)

The logic for selecting the archive to use for a query seems convoluted. all we need to do is return the highest resolution archive that has data for the requested time range

no. we also need to deal with multiple different raw intervals (and how their lcm may be > 1st rollup) and with the per-aggsetting ready parameter. I think I can replace the ready filtering in getOptionsLikeGraphite in favor of just checking the ready state in alignRequestsLikeGraphite. this should save about 5 lines. Can you point to any specific items that you think can be removed?

woodsaj · 2017-02-19T17:55:33Z

I think there should be a grace period between introducing the new code and getting rid of the old.

the old code is broken and returns incorrect data, get rid of it.

no. we also need to deal with multiple different raw intervals (and how their lcm may be > 1st rollup)

that is not how graphite handles queries. The only information used for selecting the rollup is the request "from" time. Users sending data with an interval less frequent then the first rollup is an edge case we dont need to support right now. that will be solved by supporting storeage-schema config (ie per series aggSettings)

woodsaj · 2017-02-19T17:59:49Z

api/query_engine.go

+// represents a numbered data "archive", i.e. the raw one, or an aggregated series
+// number 0 = raw archive, number 1,2,..N = aggregation 1,2..N
+// this allows you to reorder narchives and still keep track of which is which
+type narchive struct {


what does narchive mean? In general just dont use abbreviations.

woodsaj · 2017-02-19T18:00:44Z

api/query_engine.go

+type narchive struct {
+	i          int
+	interval   uint32
+	pointCount uint32


this is not used anywhere except in the String() method

woodsaj · 2017-02-19T18:02:33Z

api/query_engine.go

-// note: it is assumed that all requests have the same from, to and maxdatapoints!
-// this function ignores the TTL values. it is assumed that you've set sensible TTL's
-func alignRequests(reqs []models.Req, aggSettings mdata.AggSettings) ([]models.Req, error) {
+// represents a numbered data "archive", i.e. the raw one, or an aggregated series


Why are there 2 almost identical data structures (archive and narchive). Surely you can just add the position variable (i) to archive and remove narchive. Just because it is defined doesnt mean it has to be used.

* remove support for old reading method * remove support for raw LCM interval > first rollup interval, see #534 (comment) * since the logic is now very obvious and transparant, no more need to do debug logging on different options, pointcounts, etc so, can remove the archive type.

Dieterbe · 2017-02-20T14:09:10Z

I removed the old approach as requested, and simplified more. (see last commit above)
been testing/experimenting a bit. on my pc going to 500k~1M points makes it take several seconds. MT reports upto 80ms. seems several seconds are spent in graphite and/or sending data to graphite. EDIT: actually statsd reports most time spent in unmarshal_raintank_resp actually. up to 200k points is pleasant (fraction of a second response when no processing in graphite). I think we might want to shorten the TTL of raw data for some of our customers to avoid this being an issue. (we have some customers where we maintain raw 35 days and first rollup 38days)

EDIT: now with chart: https://snapshot.raintank.io/dashboard/snapshot/yEc5WKGRsxFbJXOPqdRrBhkD3gILYej3

* remove support for old reading method * remove support for raw LCM interval > first rollup interval, see #534 (comment) * since the logic is now very obvious and transparant, no more need to do debug logging on different options, pointcounts, etc so, can remove the archive type.

Dieterbe · 2017-02-20T15:58:42Z

@woodsaj PTAL and merge if you like.

woodsaj · 2017-02-21T08:30:27Z

looks great @Dieterbe

We can likely reduce the query times by enabling gzip compression on the data sent between MT and graphite. https://go-macaron.com/docs/middlewares/gzip Ill open a separate issue for that.

Dieterbe added 3 commits February 16, 2017 15:05

untangle some of the long alignRequests code to dedicated functions

44227f7

make util.Lcm a tad easier to understand

a7775e3

Dieterbe force-pushed the graphitey-consolidation branch from 84aaf61 to 54fef31 Compare February 17, 2017 13:03

Dieterbe added 4 commits February 17, 2017 14:05

implement read-like-graphite, via configuration flag

6d1dd29

fix #503

move alignRequests tests/bench to correct file

59d7de5

fix tests

40ef4b7

add test/bench for alignLikeGraphite

631f67b

Dieterbe force-pushed the graphitey-consolidation branch from 54fef31 to 631f67b Compare February 17, 2017 16:00

replay reviewed Feb 18, 2017

View reviewed changes

replay approved these changes Feb 18, 2017

View reviewed changes

woodsaj reviewed Feb 19, 2017

View reviewed changes

Dieterbe changed the title ~~WIP: Graphitey consolidation~~ Graphitey consolidation part 1 Feb 20, 2017

Dieterbe force-pushed the graphitey-consolidation branch from 33399b4 to 68e2e25 Compare February 20, 2017 15:59

woodsaj merged commit 6181c57 into master Feb 21, 2017

woodsaj deleted the graphitey-consolidation branch February 21, 2017 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graphitey consolidation part 1 #534

Graphitey consolidation part 1 #534

Dieterbe commented Feb 17, 2017

Dieterbe commented Feb 17, 2017

replay Feb 18, 2017

replay Feb 18, 2017

replay Feb 18, 2017

replay Feb 18, 2017 •

edited

Loading

replay left a comment

woodsaj commented Feb 19, 2017 •

edited

Loading

Dieterbe commented Feb 19, 2017

woodsaj commented Feb 19, 2017

woodsaj Feb 19, 2017 •

edited

Loading

woodsaj Feb 19, 2017

woodsaj Feb 19, 2017

Dieterbe commented Feb 20, 2017 •

edited

Loading

Dieterbe commented Feb 20, 2017

woodsaj commented Feb 21, 2017

Graphitey consolidation part 1 #534

Graphitey consolidation part 1 #534

Conversation

Dieterbe commented Feb 17, 2017

Dieterbe commented Feb 17, 2017

replay Feb 18, 2017

Choose a reason for hiding this comment

replay Feb 18, 2017

Choose a reason for hiding this comment

replay Feb 18, 2017

Choose a reason for hiding this comment

replay Feb 18, 2017 • edited Loading

Choose a reason for hiding this comment

replay left a comment

Choose a reason for hiding this comment

woodsaj commented Feb 19, 2017 • edited Loading

Dieterbe commented Feb 19, 2017

woodsaj commented Feb 19, 2017

woodsaj Feb 19, 2017 • edited Loading

Choose a reason for hiding this comment

woodsaj Feb 19, 2017

Choose a reason for hiding this comment

woodsaj Feb 19, 2017

Choose a reason for hiding this comment

Dieterbe commented Feb 20, 2017 • edited Loading

Dieterbe commented Feb 20, 2017

woodsaj commented Feb 21, 2017

replay Feb 18, 2017 •

edited

Loading

woodsaj commented Feb 19, 2017 •

edited

Loading

woodsaj Feb 19, 2017 •

edited

Loading

Dieterbe commented Feb 20, 2017 •

edited

Loading