Introduces a second chunk format to include length #418

replay · 2016-12-12T06:26:43Z

By default all new chunks will be written in the new format. This means
the second byte of the binary data is a uint8 that specifies how many
10min intervals this chunk is long.

We only specify with a precision of 10min to require less space to
define the length.

If a configured chunk span (or aggmetric chunk span) is not dividable
by 10min we error at startup, same if it exceeds the maximum length of 2^8 * 10min.

Dieterbe · 2016-12-12T10:08:52Z

I think we should be able to cover chunkspans between:

1 minute (possibly even shorter)
24 hour (possibly even longer)

This proposal has a lower limit of 10min and an upper limit of just over 42hours
Here's why I think we should support such small chunks:

after about 60 to 120points there are little compression gains (see http://www.vldb.org/pvldb/vol8/p1816-teller.pdf or https://raw.githubusercontent.com/dgryski/go-tsz/master/eval/eval-results.png), and minimal supported data resolution is 1s, so 1min or 2min can be a reasonable chunk size.
we have to think about smaller installations that will be less highly available or harder/impossible to recover (e.g. single instance with carbon input), so those would prefer to store data more frequently to minimize potential data loss. It would make sense to save data every minute, or even every 10s, or even as low as every second as points come in. you'd have less benefit of the compression but i can see how such a setup can make sense for certain installations (e.g. MT would still have benefits over whisper)

We already determined that we don't want to store the length as log2 because that's too awkward to work with, but I think what would work is a lookup table. I think covers pretty much all values a user may want to configure for a chunkspan:

1s
5s
10s
15s
20s
30s
60s
90s
2min
3min
5min
10min
15min
20min
30min
45min
1hr
1.5hr
2hr
2.5hr
3hr
4hr
5hr
6hr
8hr
9hr
10hr
12hr
15hr
18hr
20hr
24hr

this is only 32 values, so well within the range of a uint8. and in fact we can many more later to the table later if we want to.
we can simply include a static (so doesn't have to be a slice, can be an array) [32]uint32 in the source code somewhere, which only takes about 128bytes and this way we have a broad and flexible range of possible chunk lengths that can be encoded in 1 byte per chunk

what do you think @replay and @woodsaj ?

replay · 2016-12-12T10:12:51Z

I quite like that idea of a lookup table. That way we can kind of fake exponential increase of the distance between the chunk sizes while avoiding the awkward numbers that log2 would give us, so we get the advantages of both.

Dieterbe · 2016-12-13T11:31:33Z

mdata/chunk/sizes.go

+
+type Size uint8
+
+var ChunkSizes = map[Size]uint32{


instead of a map you can simply use a [32]uint32 (an array, not a slice, since it's static). It's simpler and will perform slightly better too.
So something like

var ChunkSizes = [32]uint32{ 1, 5, 10, ... }

This only works because the keys are just numbers from 0 to 31
for the reverse you'd still need the map.

BTW see https://blog.golang.org/go-slices-usage-and-internals for some interesting implementation details of slices and arrays

right, good catch

Dieterbe · 2016-12-13T11:40:01Z

mdata/chunk/sizes.go

+var RevChunkSizes = make(map[uint32]Size, len(ChunkSizes))
+
+var initMutex sync.Mutex
+var initialized = false


Both this bool and the lock seem unneeded. Go will call the init function once, and it will process all init functions before calling main

At the time of writing this part I looked up the docs, but they were not explicit about what happens if there are multiple threads importing this package: https://golang.org/doc/effective_go.html#init

If you're sure about that I'll remove the lock

It's not easy to find, but on https://golang.org/ref/spec#Package_initialization it says

Package initialization—variable initialization and the invocation of init functions—happens in a single goroutine, sequentially, one package at a time. An init function may launch other goroutines, which can run concurrently with the initialization code. However, initialization always sequences the init functions: it will not invoke the next one until the previous one has returned.

Also it doesn't really matter much how many times a package is imported, each init function will only be called once.

Funny anecdote though.. we used to have a raintank fork of grafana, and do some symlinking tricks in our go packages. I once made the mistake of importing the same package code from other packages, but using different import paths, in which case it ran the init twice since each was considered a different package, and this led to some really confusing results where both packages had different state.

Thx for the link.
Funny anecdote, but totally makes sense :)

Dieterbe · 2016-12-13T11:41:24Z

mdata/chunk/format.go

@@ -10,4 +10,5 @@ type Format uint8
 // identifier of message format
 const (
 	FormatStandardGoTsz Format = iota
+	FormatWithLen


since this is an extension of FormatStandardGoTsz, that also uses the same go-tsz encoding, but with the extra len field, this would be better named something like FormatStandardGoTszWithLen

Actually, we pretty much use the term span everywhere to convey the timerange covered in seconds
For consistency let's adopt the term span also here instead of len and size.

Dieterbe · 2016-12-13T11:52:37Z

mdata/chunk/sizes.go

+	"sync"
+)
+
+type Size uint8


can we call this SpanCode or something? so that we're more clear everywhere we use this that it's the code for the amount of seconds, not the amount of seconds itself

Dieterbe · 2016-12-13T11:53:52Z

mdata/store_cassandra.go

+			chunkSize, ok := chunk.RevChunkSizes[cwr.size]
+			if !ok {
+				// it's probably better to panic than to persist the chunk with a wrong length
+				panic(fmt.Sprintf("Chunk size invalid: %d", cwr.size))


Dieterbe · 2016-12-13T15:59:20Z

Looking good so far, here's some documentation changes we should make:

in docs/data-knobs.md there's a section that talks about chunkspan, let's mention all the valid settings there
in docs/consolidation.md at the bottom, there's something about chunk sizing for aggregation, link to that section (look in readme or some other doc files to see how to make the links. it's markdown syntax pointing to the .md files in github)
in the error messages about the spans in metrictank, link to the online documentation
We have 3 config files:

metrictank-sample.ini
scripts/config/metrictank-docker.ini
scripts/config/metrictank-package.ini

in all of these configs, in the chunkspan and agg-settings comments, we should link to the online docs about sizing (please make sure the comments are consistent across all 3. I use vimdiff to diff and merge and highly recommend it)
once all 3 configs are updated. run ./scripts/config-to-doc.sh > docs/config.md

DanCech · 2016-12-13T16:13:44Z

mdata/store_cassandra.go

+			chunkSize, ok := chunk.RevChunkSpans[cwr.span]
+			if !ok {
+				// it's probably better to panic than to persist the chunk with a wrong length
+				panic(fmt.Sprintf("Chunk size invalid: %d", cwr.span))


How will the system recover in this case? panic is a very big hammer, and one that has bitten us in the past.

I can only think of 2 cases where this can happen:

a critical bug in MT that would corrupt our data if MT kept running

cosmic x-rays flipping bits in RAM or the cpu.

Neither should be attempted to be recovered in an automatic way IMHO (other than spinning up a new MT instance perhaps)

Yeah, the best recovery is probably to kill it, restart it, and replay kafka logs. which is bad, but still better than corrupt data in cassandra, no?

I don't pretend to grok how this fits into the larger scheme of things, and if it's a scenario where the best course of action is to shut down MT altogether then that's fine, I just want to be sure it's not a scenario where we end up throwing the baby out with the bathwater.

actually it's one of those things that should never happen because at the time this span gets configured there is already a check like that: https://github.com/raintank/metrictank/pull/418/files#diff-e33a682d2077aab8454c0e304dc7fe3dR253
So if it does happen it would certainly be a good thing if a human looks at it.

Dieterbe · 2016-12-14T13:50:39Z

hey @replay , I just pushed a bunch of small commits, with the messages explaining why.
let me know what you think

Dieterbe · 2016-12-14T13:55:18Z

i also just rebased the commits on top of master and repushed

replay · 2016-12-14T14:22:48Z

@Dieterbe good catch with the off-by-one error!

Do you think it's better to just skip the whole explanation about why only a finite set of values is valid as chunk spans? I figured if I'm a user and I meet that limitation I'd be kind of like "that's annoying, why did they do that?" and that's why I thought I should add a short explanation.

Dieterbe · 2016-12-14T14:30:22Z

personally I think it's not relevant for the docs. Users should have absolutely no problem using the predefined chunksizes. If it's annoying to them it's a sign we did something wrong and in that case I rather have them reach out to us so we can help them be successful with metrictank.

Note: we mention some implementation details in the docs (like the fact that we use the gorilla compression) when it serves to showcase the pro's of metrictank, but even there it's pretty abstract

replay · 2016-12-14T14:52:03Z

Usually I don't really believe in that there can be "too much information", especially not related to a product which is probably going to be used by quite technical people.

But I'm also fine just skipping it, if they really want to know the docs are in the code ;)

By default all new chunks will be written in the new format. This means the second byte of the chunk specifies the chunkspan. We chose to use a lookup table because: - Using a log2 notation to define the chunk size would optimize for space, but it makes us have to deal with awkward numbers. - Writing down a static multiple of seconds would result in a relatively narrow range of chunk lengths that could be represented. Using the lookup table combines the advantages of the two above solutions while excluding the disadvantages.

replay force-pushed the save_chunk_length branch 2 times, most recently from 07e4b50 to e9d48f5 Compare December 12, 2016 06:45

replay force-pushed the save_chunk_length branch 2 times, most recently from 61bc48a to fee6bc1 Compare December 13, 2016 11:13

Dieterbe reviewed Dec 13, 2016

View reviewed changes

DanCech reviewed Dec 13, 2016

View reviewed changes

replay force-pushed the save_chunk_length branch 6 times, most recently from 80fb36c to 2f2b589 Compare December 14, 2016 07:50

Dieterbe force-pushed the save_chunk_length branch from 1118873 to 095a24e Compare December 14, 2016 13:54

Dieterbe approved these changes Dec 14, 2016

View reviewed changes

Dieterbe force-pushed the save_chunk_length branch from 095a24e to 0021004 Compare December 14, 2016 15:02

Dieterbe merged commit 47b65e2 into master Dec 14, 2016

replay mentioned this pull request Jan 12, 2017

"start must be before end" when requested time range is correct #454

Closed

Dieterbe deleted the save_chunk_length branch December 15, 2017 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduces a second chunk format to include length #418

Introduces a second chunk format to include length #418

replay commented Dec 12, 2016

Dieterbe commented Dec 12, 2016 •

edited

Loading

replay commented Dec 12, 2016

Dieterbe Dec 13, 2016 •

edited

Loading

replay Dec 13, 2016

Dieterbe Dec 13, 2016

replay Dec 13, 2016

Dieterbe Dec 13, 2016

replay Dec 13, 2016

Dieterbe Dec 13, 2016

Dieterbe Dec 13, 2016

Dieterbe Dec 13, 2016

Dieterbe Dec 13, 2016

Dieterbe commented Dec 13, 2016

DanCech Dec 13, 2016

Dieterbe Dec 13, 2016

replay Dec 13, 2016 •

edited

Loading

DanCech Dec 13, 2016

replay Dec 13, 2016 •

edited

Loading

Dieterbe commented Dec 14, 2016

Dieterbe commented Dec 14, 2016

replay commented Dec 14, 2016 •

edited

Loading

Dieterbe commented Dec 14, 2016 •

edited

Loading

replay commented Dec 14, 2016

Introduces a second chunk format to include length #418

Introduces a second chunk format to include length #418

Conversation

replay commented Dec 12, 2016

Dieterbe commented Dec 12, 2016 • edited Loading

replay commented Dec 12, 2016

Dieterbe Dec 13, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe commented Dec 13, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Dec 13, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Dec 13, 2016 • edited Loading

Choose a reason for hiding this comment

Dieterbe commented Dec 14, 2016

Dieterbe commented Dec 14, 2016

replay commented Dec 14, 2016 • edited Loading

Dieterbe commented Dec 14, 2016 • edited Loading

replay commented Dec 14, 2016

Dieterbe commented Dec 12, 2016 •

edited

Loading

Dieterbe Dec 13, 2016 •

edited

Loading

replay Dec 13, 2016 •

edited

Loading

replay Dec 13, 2016 •

edited

Loading

replay commented Dec 14, 2016 •

edited

Loading

Dieterbe commented Dec 14, 2016 •

edited

Loading