Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Graphitey consolidation part 1 #534

Merged
merged 8 commits into from
Feb 21, 2017
Merged

Graphitey consolidation part 1 #534

merged 8 commits into from
Feb 21, 2017

Conversation

Dieterbe
Copy link
Contributor

No description provided.

By making sure the user has their agg settings in proper order:
1) we make their lives easier because come on seriously
2) we avoid the need to sort in alignRequests
3) alignRequests can print archive numbers (1,2,3,... being the archive
   in a list of sorted archives) that always match the index
   of the archive in the config
@Dieterbe
Copy link
Contributor Author

@woodsaj or @replay if you have some time, mind taking a look? this covers step 1 of #463 (comment)

}
lcm := util.Lcm(keys)

options := make([]narchive, 0, len(aggSettings.Aggs)+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of []narchive you could just use narchives

// potentially making the "raw interval" > first rollup interval, so that
// raw archive might be ordered after a rollup, and where the archive
// itself knows which one it is, instead of having to rely on index lookup table.
func getOptionsLikeGraphite(reqs []models.Req, aggSettings mdata.AggSettings, tsRange uint32) ([]narchive, map[uint32]struct{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just use narchives instead of []narchive if the type is already defined anyway

}

if LogLevel < 2 {
options[selected].chosen = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this saves an operation in prod, I wouldn't modify such an important data structure under a condition that depends on the loglevel even if at the moment it isn't used at any other critical location (not considering narchive.String() critical) because I think it's too easy for somebody to use that .chosen flag for something else without being aware that it's correctness depends on such an arbitrary setting like the log level.

// number 0 = raw archive, number 1,2,..N = aggregation 1,2..N
// this allows you to reorder narchives and still keep track of which is which
type narchive struct {
i int
Copy link
Contributor

@replay replay Feb 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some explanatory name and a more extensive comment that explains why it's necessary to "keep track of which is which" would be helpful

Copy link
Contributor

@replay replay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with some minor comments

@woodsaj
Copy link
Member

woodsaj commented Feb 19, 2017

This seems to be way more complicated then it should be. We are simplifying our code, so there should be more lines of code deleted then new lines being added.

  1. i dont like the idea of supporting 2 query modes. We are building a graphite compatible backend, so there is no need for a "make incompatible" flag. Supporting the 2 modes just complicates the code.

  2. The logic for selecting the archive to use for a query seems convoluted. all we need to do is return the highest resolution archive that has data for the requested time range. As all series have the same aggSettings, and aggSettings.Aggs should already be sorted by data resolution (if we dont re-order when parsing the config, we should)

func (setting AggSettings) ForRequest(from int32) int {
  now := time.Now().Unix()
  if now-setting.RawTTL <= from {
    return 0  // should probably use a constant like mdata.RawArchive
  }
  for i, aggSetting := range setting.Aggs {
    if now-agg.TTL <= from {
       return i+1
   }
  }
  return len(setting.Aggs)
}

@Dieterbe
Copy link
Contributor Author

i dont like the idea of supporting 2 query modes. We are building a graphite compatible backend, so there is no need for a "make incompatible" flag. Supporting the 2 modes just complicates the code.

I think there should be a grace period between introducing the new code and getting rid of the old.
We should only get rid of the old code once we've given the new approach some time to mature (and possibly make adjustments) and prove that it works well for all of our environments.
There's no harm in keeping the old code around for a bit longer, and it gives us more options. (e.g. in case it doesn't work well yet for an environment but we'd like to update MT because we want to roll out other things)

if we dont re-order when parsing the config, we should

no. user should specify them in the right order. it's for their own good (see also the commit msg of 6dc62be)

The logic for selecting the archive to use for a query seems convoluted. all we need to do is return the highest resolution archive that has data for the requested time range

no. we also need to deal with multiple different raw intervals (and how their lcm may be > 1st rollup) and with the per-aggsetting ready parameter. I think I can replace the ready filtering in getOptionsLikeGraphite in favor of just checking the ready state in alignRequestsLikeGraphite. this should save about 5 lines. Can you point to any specific items that you think can be removed?

@woodsaj
Copy link
Member

woodsaj commented Feb 19, 2017

I think there should be a grace period between introducing the new code and getting rid of the old.

the old code is broken and returns incorrect data, get rid of it.

no. we also need to deal with multiple different raw intervals (and how their lcm may be > 1st rollup)

that is not how graphite handles queries. The only information used for selecting the rollup is the request "from" time. Users sending data with an interval less frequent then the first rollup is an edge case we dont need to support right now. that will be solved by supporting storeage-schema config (ie per series aggSettings)

// represents a numbered data "archive", i.e. the raw one, or an aggregated series
// number 0 = raw archive, number 1,2,..N = aggregation 1,2..N
// this allows you to reorder narchives and still keep track of which is which
type narchive struct {
Copy link
Member

@woodsaj woodsaj Feb 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does narchive mean? In general just dont use abbreviations.

type narchive struct {
i int
interval uint32
pointCount uint32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used anywhere except in the String() method

// note: it is assumed that all requests have the same from, to and maxdatapoints!
// this function ignores the TTL values. it is assumed that you've set sensible TTL's
func alignRequests(reqs []models.Req, aggSettings mdata.AggSettings) ([]models.Req, error) {
// represents a numbered data "archive", i.e. the raw one, or an aggregated series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there 2 almost identical data structures (archive and narchive). Surely you can just add the position variable (i) to archive and remove narchive. Just because it is defined doesnt mean it has to be used.

Dieterbe added a commit that referenced this pull request Feb 20, 2017
* remove support for old reading method
* remove support for raw LCM interval > first rollup interval, see
#534 (comment)
* since the logic is now very obvious and transparant, no more need
  to do debug logging on different options, pointcounts, etc
  so, can remove the archive type.
@Dieterbe
Copy link
Contributor Author

Dieterbe commented Feb 20, 2017

I removed the old approach as requested, and simplified more. (see last commit above)
been testing/experimenting a bit. on my pc going to 500k~1M points makes it take several seconds. MT reports upto 80ms. seems several seconds are spent in graphite and/or sending data to graphite. EDIT: actually statsd reports most time spent in unmarshal_raintank_resp actually. up to 200k points is pleasant (fraction of a second response when no processing in graphite). I think we might want to shorten the TTL of raw data for some of our customers to avoid this being an issue. (we have some customers where we maintain raw 35 days and first rollup 38days)

EDIT: now with chart: https://snapshot.raintank.io/dashboard/snapshot/yEc5WKGRsxFbJXOPqdRrBhkD3gILYej3

* remove support for old reading method
* remove support for raw LCM interval > first rollup interval, see
#534 (comment)
* since the logic is now very obvious and transparant, no more need
  to do debug logging on different options, pointcounts, etc
  so, can remove the archive type.
@Dieterbe Dieterbe changed the title WIP: Graphitey consolidation Graphitey consolidation part 1 Feb 20, 2017
@Dieterbe
Copy link
Contributor Author

@woodsaj PTAL and merge if you like.

@woodsaj woodsaj merged commit 6181c57 into master Feb 21, 2017
@woodsaj woodsaj deleted the graphitey-consolidation branch February 21, 2017 08:29
@woodsaj
Copy link
Member

woodsaj commented Feb 21, 2017

looks great @Dieterbe

We can likely reduce the query times by enabling gzip compression on the data sent between MT and graphite. https://go-macaron.com/docs/middlewares/gzip Ill open a separate issue for that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants