-
Notifications
You must be signed in to change notification settings - Fork 104
Introduces a second chunk format to include length #418
Conversation
07e4b50
to
e9d48f5
Compare
I think we should be able to cover chunkspans between:
This proposal has a lower limit of 10min and an upper limit of just over 42hours
We already determined that we don't want to store the length as log2 because that's too awkward to work with, but I think what would work is a lookup table. I think covers pretty much all values a user may want to configure for a chunkspan:
this is only 32 values, so well within the range of a uint8. and in fact we can many more later to the table later if we want to. |
I quite like that idea of a lookup table. That way we can kind of fake exponential increase of the distance between the chunk sizes while avoiding the awkward numbers that log2 would give us, so we get the advantages of both. |
61bc48a
to
fee6bc1
Compare
|
||
type Size uint8 | ||
|
||
var ChunkSizes = map[Size]uint32{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of a map you can simply use a [32]uint32
(an array, not a slice, since it's static). It's simpler and will perform slightly better too.
So something like
var ChunkSizes = [32]uint32{
1,
5,
10,
...
}
This only works because the keys are just numbers from 0 to 31
for the reverse you'd still need the map.
BTW see https://blog.golang.org/go-slices-usage-and-internals for some interesting implementation details of slices and arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, good catch
var RevChunkSizes = make(map[uint32]Size, len(ChunkSizes)) | ||
|
||
var initMutex sync.Mutex | ||
var initialized = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both this bool and the lock seem unneeded. Go will call the init function once, and it will process all init functions before calling main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time of writing this part I looked up the docs, but they were not explicit about what happens if there are multiple threads importing this package: https://golang.org/doc/effective_go.html#init
If you're sure about that I'll remove the lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not easy to find, but on https://golang.org/ref/spec#Package_initialization it says
Package initialization—variable initialization and the invocation of init functions—happens in a single goroutine, sequentially, one package at a time. An init function may launch other goroutines, which can run concurrently with the initialization code. However, initialization always sequences the init functions: it will not invoke the next one until the previous one has returned.
Also it doesn't really matter much how many times a package is imported, each init function will only be called once.
Funny anecdote though.. we used to have a raintank fork of grafana, and do some symlinking tricks in our go packages. I once made the mistake of importing the same package code from other packages, but using different import paths, in which case it ran the init twice since each was considered a different package, and this led to some really confusing results where both packages had different state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for the link.
Funny anecdote, but totally makes sense :)
@@ -10,4 +10,5 @@ type Format uint8 | |||
// identifier of message format | |||
const ( | |||
FormatStandardGoTsz Format = iota | |||
FormatWithLen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is an extension of FormatStandardGoTsz, that also uses the same go-tsz encoding, but with the extra len field, this would be better named something like FormatStandardGoTszWithLen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we pretty much use the term span
everywhere to convey the timerange covered in seconds
For consistency let's adopt the term span also here instead of len and size.
"sync" | ||
) | ||
|
||
type Size uint8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call this SpanCode or something? so that we're more clear everywhere we use this that it's the code for the amount of seconds, not the amount of seconds itself
chunkSize, ok := chunk.RevChunkSizes[cwr.size] | ||
if !ok { | ||
// it's probably better to panic than to persist the chunk with a wrong length | ||
panic(fmt.Sprintf("Chunk size invalid: %d", cwr.size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
Looking good so far, here's some documentation changes we should make:
in all of these configs, in the chunkspan and agg-settings comments, we should link to the online docs about sizing (please make sure the comments are consistent across all 3. I use vimdiff to diff and merge and highly recommend it) |
chunkSize, ok := chunk.RevChunkSpans[cwr.span] | ||
if !ok { | ||
// it's probably better to panic than to persist the chunk with a wrong length | ||
panic(fmt.Sprintf("Chunk size invalid: %d", cwr.span)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will the system recover in this case? panic
is a very big hammer, and one that has bitten us in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only think of 2 cases where this can happen:
- a critical bug in MT that would corrupt our data if MT kept running
- cosmic x-rays flipping bits in RAM or the cpu.
Neither should be attempted to be recovered in an automatic way IMHO (other than spinning up a new MT instance perhaps)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the best recovery is probably to kill it, restart it, and replay kafka logs. which is bad, but still better than corrupt data in cassandra, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't pretend to grok how this fits into the larger scheme of things, and if it's a scenario where the best course of action is to shut down MT altogether then that's fine, I just want to be sure it's not a scenario where we end up throwing the baby out with the bathwater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually it's one of those things that should never happen
because at the time this span
gets configured there is already a check like that: https://github.com/raintank/metrictank/pull/418/files#diff-e33a682d2077aab8454c0e304dc7fe3dR253
So if it does happen it would certainly be a good thing if a human looks at it.
80fb36c
to
2f2b589
Compare
hey @replay , I just pushed a bunch of small commits, with the messages explaining why. |
1118873
to
095a24e
Compare
i also just rebased the commits on top of master and repushed |
@Dieterbe good catch with the off-by-one error! Do you think it's better to just skip the whole explanation about why only a finite set of values is valid as chunk spans? I figured if I'm a user and I meet that limitation I'd be kind of like "that's annoying, why did they do that?" and that's why I thought I should add a short explanation. |
personally I think it's not relevant for the docs. Users should have absolutely no problem using the predefined chunksizes. If it's annoying to them it's a sign we did something wrong and in that case I rather have them reach out to us so we can help them be successful with metrictank. Note: we mention some implementation details in the docs (like the fact that we use the gorilla compression) when it serves to showcase the pro's of metrictank, but even there it's pretty abstract |
Usually I don't really believe in that there can be "too much information", especially not related to a product which is probably going to be used by quite technical people. But I'm also fine just skipping it, if they really want to know the docs are in the code ;) |
By default all new chunks will be written in the new format. This means the second byte of the chunk specifies the chunkspan. We chose to use a lookup table because: - Using a log2 notation to define the chunk size would optimize for space, but it makes us have to deal with awkward numbers. - Writing down a static multiple of seconds would result in a relatively narrow range of chunk lengths that could be represented. Using the lookup table combines the advantages of the two above solutions while excluding the disadvantages.
095a24e
to
0021004
Compare
By default all new chunks will be written in the new format. This means
the second byte of the binary data is a
uint8
that specifies how many10min intervals this chunk is long.
We only specify with a precision of 10min to require less space to
define the length.
If a configured chunk span (or aggmetric chunk span) is not dividable
by 10min we error at startup, same if it exceeds the maximum length of 2^8 * 10min.