-
Notifications
You must be signed in to change notification settings - Fork 105
support multiple raw intervals per storage schema. #588
Conversation
// Aggregations holds the aggregation definitions | ||
type Aggregations struct { | ||
Data []Aggregation | ||
Data []Aggregation | ||
DefaultAggregation Aggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the point of the default is that it applies irrespective of what your Aggregation definitions are, so why move it from global to a member?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultAggregation is only modified by unitTests. Keeping it as part of the Aggregations{} instance makes it easier for unit tests to modify the defaults for testing purposes.
eg, calling mdata.SetSingleAgg(met ...conf.Method) wont break other unit tests that may be expecting the DefaultAggregation to be set to the defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but both SetSingleAgg and SetSingleSchema modify global state. wouldn't that global state stay just as "dirty" as it was before? Your goal is noble but I think it can only be fulfilled by passing the Schemas and Aggregations around instead of having them as globals in the mdata package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetSingleAgg and SetSingleSchema only modify the state within the "mdata" pacakge. Allowing them to modify global variables in other packages is a recipe for disaster.
conf/schemas.go
Outdated
// Schemas contains schema settings | ||
type Schemas struct { | ||
raw []Schema | ||
index []*Schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is raw a slice of values and index a slice of pointers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at one stage match was performing a range over the index. A range of a slice copies the values, so only copying the reference is much faster then copying the whole schema struct.
But doenst look like this is needed anymore
conf/schemas.go
Outdated
} | ||
|
||
// Match returns the correct schema setting for the given metric | ||
// it can always find a valid setting, because there's a default catch all | ||
// also returns the index of the setting, to efficiently reference it | ||
func (s Schemas) Match(metric string) (uint16, Schema) { | ||
for i, schema := range s { | ||
func (s Schemas) Match(metric string, interval int) (uint16, Schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this new function hard to understand.
One reason I think is because it's a mashup between two mental models:
A) all our schemas/retentions as 1 long list
B) a list of entries (based on pattern), and then each entry has a list of retention lists (due to the new "permutation" stuff). i.e. a nested structure.
This function has a few places where the logic works in the domain of B (e.g. taking the first retention of a given pattern, then working with len(schema.Retentions)
, or iterating that list which actually represents items further down in the index list. I drafted some refactoring, but couldn't come up with anything better. I think a nested, B-style model would be nicer for our index datastructure, but then we would need two identifiers to represent a schema, which is to be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's kind of hard to wrap ones head around that concept, but it's pretty cool that in the end pattern/retention can be looked up with only one index.
Maybe it would be good to have some comment that illustrates how that works, I imagine the s.index
slice to look like a partitioned list of retentions where each partition is a pattern:
|----------------------------------------------------------------------|
| pattern1 | pattern2 | pattern3 |
|----------------------------------------------------------------------|
| ret1 | ret2 | ret3 | ret4 | ret5 | ret6 | ret7 | ret8 | ret9 | ret10 |
|----------------------------------------------------------------------|
1) if metric matches patternX -> find the best retention for interval within patternX
2) otherwise -> skip len(patternX.retentions) and back to 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be clear I'm not saying we should change the structure, I also really like the single index. I just think it's important to make this logic as simple as we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this function is complex. It took a long time to come to this solution that only requires a single call to matchString(), but still allows a single schema ID to be passed around as a reference.
I think @replay's illustration will go a long way to clarifying what is going on. But i would change it to
conf/schemas.go
Outdated
if interval < ret.SecondsPerPoint { | ||
// if there are no retentions with SecondsPerPoint >= interval | ||
// then we need to use the first retention. Otherwise, the retention | ||
// we want to use is the previous one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confusing. we branch if interval < SecondsPerPoint but this talks about SecondsPerPoint >= interval which means interval <= SecondsPerPoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just basic logic. if a < b is true, then b >= a is false.
ie these are the same thing
if interval < ret.SecondsPerPoint
if !(ret.SecondsPerPoint >= interval)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a < b is true, then b >= a is false.
no, these expressions are in fact equivalent in all but 1 case. e.g. if a is 1 and b is 2 then both are true.
ie these are the same thing
no they are not:
interval := 10
SecondsPerPoint := 20
fmt.Println(interval < SecondsPerPoint)
fmt.Println(!(SecondsPerPoint >= interval))
this prints:
true
false
conf/schemas.go
Outdated
} | ||
// no retentions found with SecondsPerPoint > interval. So lets just use the retention | ||
// with the largest secondsPerPoint. | ||
pos := len(schema.Retentions) - 1 + i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it just me or would i + len(schema.Retentions) - 1
be easier to understand? i think of it like "i is the index where we're at, so that's the base, then from there jump to the last retention" so this order makes more sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just you. This order makes more sense to me.
|
||
// TTLs returns a slice of all TTL's seen amongst all archives of all schemas | ||
func (schemas Schemas) TTLs() []uint32 { | ||
ttls := make(map[uint32]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this nested loop and the following loop all be replaced by one if instead schemas.index
is used as input?
Also, the generated value could be cached and reused since afaik it shouldn't change once MT is up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could. this code predates the "permutation" stuff. There's no need for caching since this function rarely gets called and is not in any performance critical path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K, true. only called once in fact
|
||
// MaxChunkSpan returns the largest chunkspan seen amongst all archives of all schemas | ||
func (schemas Schemas) MaxChunkSpan() uint32 { | ||
max := uint32(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here... Only one loop would be necessary by iterating over schemas.index
.
max := schemas.MaxChunkSpan() | ||
So(max, ShouldEqual, 60*60*6) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding unit tests for my code :-D 👍
conf/schemas_test.go
Outdated
So(schema.Name, ShouldEqual, "a") | ||
So(schema.Retentions[0].SecondsPerPoint, ShouldEqual, 10) | ||
}) | ||
Convey("When metric has 1s raw interval", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason behind the ordering 10s -> 30s -> 1s ? i would do them in ascending order.
5f50d02
to
bf1bbee
Compare
I think we shouldn't use the word permutation here. a permutation is something else (e.g. implies reordering) and it's confusing. |
Isn't the whole point of the createmissing to add the aggmetrics structure when this instance has no knowledge of them? So that a new instance can start up, process the metricpersist backlog, and know how far metrics (that it hasn't seen yet, but will, soon) have been saved. see e03d2fd and #485 ? |
No. The index is always loaded into memory first before anything else. The only time a savedChunk message can be received and there be no entry in the index is if the metric has been deleted from the index. In which case we dont need to worry about the saveChunk anymore. |
Actually there is a second reason. The savedChunk message is for a metric that is on a different partition to the partitions being handled by this instance. In which case we also want to ignore the message.
|
- closes #579 - expand storage schemas into all permutations of the retentions. eg, a schema with retentions=1s:1d,1min:7d,10min:30day becomes 3 schemas: 1 - retentions=1s:1d,1min:7d,10min:30day 2 - retentions=1min:7d,10min:30day 3 - retentions=10min:30day - when calling schemas.Match() pass in the series name and interval. We find the schema with a matching pattern and then find the sub-schema with the best retention fit. The best fit is when the metric interval is >= the rawInterval and less then the interval of the next rollup. Using our above retention policy, the following matches would occur interval=1s: 1s:1d,1min:7d,10min:30day interval=10s: 1s:1d,1min:7d,10min:30day interval=60s: 1min:7d,10min:30day interval=300s: 1min:7d,10min:30day interval=3600: 10min:30day
* deprecate PersistMessage and remove parsing of that format. - NSQ uses the PersistMessageBatchV1 format since 72d2f6e (jan 06 2016) - Kafka uses it since it was introduced 02a8a92 (jul 18 2016) * simply notifier handling code, just make it utility function * lookup metricDefs from the index to get schemaId and aggId. Looking up a key in a map is way faster then comparing the name against the schema patterns. * as we are looking up the def from the index, we dont need to include Name and Interval in the savedChunk message anymore. * remove CreateMissing flag. Creating missing metrics is now always performed, if the metric is in the index.
ok makes sense. for the record I just added CreateMissing to notifierNSQ because it seemed like the right thing to do. I didn't put a whole lot of thought into it. Also, a nice property of the current index code is that while metricdef updates (e.g. existing defs) are subject to update-interval, new metrics will have archive.LastSave=0, and trigger an immediate save to cassandra. |
eg, a schema with
retentions=1s:1d,1min:7d,10min:30day
becomes 3 schemas:
1 - retentions=1s:1d,1min:7d,10min:30day
2 - retentions=1min:7d,10min:30day
3 - retentions=10min:30day
We find the schema with a matching pattern and then find
the sub-schema with the best retention fit. The best fit is when
the metric interval is >= the rawInterval and less then the interval
of the next rollup.
Using our above retention policy, the following matches would occur
interval=1s: 1s:1d,1min:7d,10min:30day
interval=10s: 1s:1d,1min:7d,10min:30day
interval=60s: 1min:7d,10min:30day
interval=300s: 1min:7d,10min:30day
interval=3600: 10min:30day