Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Introduces a second chunk format to include length #418

Merged
merged 1 commit into from
Dec 14, 2016

Conversation

replay
Copy link
Contributor

@replay replay commented Dec 12, 2016

By default all new chunks will be written in the new format. This means
the second byte of the binary data is a uint8 that specifies how many
10min intervals this chunk is long.

We only specify with a precision of 10min to require less space to
define the length.

If a configured chunk span (or aggmetric chunk span) is not dividable
by 10min we error at startup, same if it exceeds the maximum length of 2^8 * 10min.

@replay replay force-pushed the save_chunk_length branch 2 times, most recently from 07e4b50 to e9d48f5 Compare December 12, 2016 06:45
@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 12, 2016

I think we should be able to cover chunkspans between:

  • 1 minute (possibly even shorter)
  • 24 hour (possibly even longer)

This proposal has a lower limit of 10min and an upper limit of just over 42hours
Here's why I think we should support such small chunks:

  • after about 60 to 120points there are little compression gains (see http://www.vldb.org/pvldb/vol8/p1816-teller.pdf or https://raw.githubusercontent.com/dgryski/go-tsz/master/eval/eval-results.png), and minimal supported data resolution is 1s, so 1min or 2min can be a reasonable chunk size.
  • we have to think about smaller installations that will be less highly available or harder/impossible to recover (e.g. single instance with carbon input), so those would prefer to store data more frequently to minimize potential data loss. It would make sense to save data every minute, or even every 10s, or even as low as every second as points come in. you'd have less benefit of the compression but i can see how such a setup can make sense for certain installations (e.g. MT would still have benefits over whisper)

We already determined that we don't want to store the length as log2 because that's too awkward to work with, but I think what would work is a lookup table. I think covers pretty much all values a user may want to configure for a chunkspan:

1s
5s
10s
15s
20s
30s
60s
90s
2min
3min
5min
10min
15min
20min
30min
45min
1hr
1.5hr
2hr
2.5hr
3hr
4hr
5hr
6hr
8hr
9hr
10hr
12hr
15hr
18hr
20hr
24hr

this is only 32 values, so well within the range of a uint8. and in fact we can many more later to the table later if we want to.
we can simply include a static (so doesn't have to be a slice, can be an array) [32]uint32 in the source code somewhere, which only takes about 128bytes and this way we have a broad and flexible range of possible chunk lengths that can be encoded in 1 byte per chunk

what do you think @replay and @woodsaj ?

@replay
Copy link
Contributor Author

replay commented Dec 12, 2016

I quite like that idea of a lookup table. That way we can kind of fake exponential increase of the distance between the chunk sizes while avoiding the awkward numbers that log2 would give us, so we get the advantages of both.

@replay replay force-pushed the save_chunk_length branch 2 times, most recently from 61bc48a to fee6bc1 Compare December 13, 2016 11:13

type Size uint8

var ChunkSizes = map[Size]uint32{
Copy link
Contributor

@Dieterbe Dieterbe Dec 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a map you can simply use a [32]uint32 (an array, not a slice, since it's static). It's simpler and will perform slightly better too.
So something like

var ChunkSizes = [32]uint32{
1,
5,
10,
...
}

This only works because the keys are just numbers from 0 to 31
for the reverse you'd still need the map.

BTW see https://blog.golang.org/go-slices-usage-and-internals for some interesting implementation details of slices and arrays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, good catch

var RevChunkSizes = make(map[uint32]Size, len(ChunkSizes))

var initMutex sync.Mutex
var initialized = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this bool and the lock seem unneeded. Go will call the init function once, and it will process all init functions before calling main

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time of writing this part I looked up the docs, but they were not explicit about what happens if there are multiple threads importing this package: https://golang.org/doc/effective_go.html#init

If you're sure about that I'll remove the lock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not easy to find, but on https://golang.org/ref/spec#Package_initialization it says

Package initialization—variable initialization and the invocation of init functions—happens in a single goroutine, sequentially, one package at a time. An init function may launch other goroutines, which can run concurrently with the initialization code. However, initialization always sequences the init functions: it will not invoke the next one until the previous one has returned.

Also it doesn't really matter much how many times a package is imported, each init function will only be called once.

Funny anecdote though.. we used to have a raintank fork of grafana, and do some symlinking tricks in our go packages. I once made the mistake of importing the same package code from other packages, but using different import paths, in which case it ran the init twice since each was considered a different package, and this led to some really confusing results where both packages had different state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the link.
Funny anecdote, but totally makes sense :)

@@ -10,4 +10,5 @@ type Format uint8
// identifier of message format
const (
FormatStandardGoTsz Format = iota
FormatWithLen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is an extension of FormatStandardGoTsz, that also uses the same go-tsz encoding, but with the extra len field, this would be better named something like FormatStandardGoTszWithLen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we pretty much use the term span everywhere to convey the timerange covered in seconds
For consistency let's adopt the term span also here instead of len and size.

"sync"
)

type Size uint8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this SpanCode or something? so that we're more clear everywhere we use this that it's the code for the amount of seconds, not the amount of seconds itself

chunkSize, ok := chunk.RevChunkSizes[cwr.size]
if !ok {
// it's probably better to panic than to persist the chunk with a wrong length
panic(fmt.Sprintf("Chunk size invalid: %d", cwr.size))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@Dieterbe
Copy link
Contributor

Looking good so far, here's some documentation changes we should make:

  • in docs/data-knobs.md there's a section that talks about chunkspan, let's mention all the valid settings there
  • in docs/consolidation.md at the bottom, there's something about chunk sizing for aggregation, link to that section (look in readme or some other doc files to see how to make the links. it's markdown syntax pointing to the .md files in github)
  • in the error messages about the spans in metrictank, link to the online documentation
  • We have 3 config files:
  • metrictank-sample.ini
  • scripts/config/metrictank-docker.ini
  • scripts/config/metrictank-package.ini

in all of these configs, in the chunkspan and agg-settings comments, we should link to the online docs about sizing (please make sure the comments are consistent across all 3. I use vimdiff to diff and merge and highly recommend it)
once all 3 configs are updated. run ./scripts/config-to-doc.sh > docs/config.md

chunkSize, ok := chunk.RevChunkSpans[cwr.span]
if !ok {
// it's probably better to panic than to persist the chunk with a wrong length
panic(fmt.Sprintf("Chunk size invalid: %d", cwr.span))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the system recover in this case? panic is a very big hammer, and one that has bitten us in the past.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only think of 2 cases where this can happen:

  • a critical bug in MT that would corrupt our data if MT kept running
  • cosmic x-rays flipping bits in RAM or the cpu.

Neither should be attempted to be recovered in an automatic way IMHO (other than spinning up a new MT instance perhaps)

Copy link
Contributor Author

@replay replay Dec 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the best recovery is probably to kill it, restart it, and replay kafka logs. which is bad, but still better than corrupt data in cassandra, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't pretend to grok how this fits into the larger scheme of things, and if it's a scenario where the best course of action is to shut down MT altogether then that's fine, I just want to be sure it's not a scenario where we end up throwing the baby out with the bathwater.

Copy link
Contributor Author

@replay replay Dec 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it's one of those things that should never happen because at the time this span gets configured there is already a check like that: https://github.com/raintank/metrictank/pull/418/files#diff-e33a682d2077aab8454c0e304dc7fe3dR253
So if it does happen it would certainly be a good thing if a human looks at it.

@replay replay force-pushed the save_chunk_length branch 6 times, most recently from 80fb36c to 2f2b589 Compare December 14, 2016 07:50
@Dieterbe
Copy link
Contributor

hey @replay , I just pushed a bunch of small commits, with the messages explaining why.
let me know what you think

@Dieterbe
Copy link
Contributor

i also just rebased the commits on top of master and repushed

@replay
Copy link
Contributor Author

replay commented Dec 14, 2016

@Dieterbe good catch with the off-by-one error!

Do you think it's better to just skip the whole explanation about why only a finite set of values is valid as chunk spans? I figured if I'm a user and I meet that limitation I'd be kind of like "that's annoying, why did they do that?" and that's why I thought I should add a short explanation.

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 14, 2016

personally I think it's not relevant for the docs. Users should have absolutely no problem using the predefined chunksizes. If it's annoying to them it's a sign we did something wrong and in that case I rather have them reach out to us so we can help them be successful with metrictank.

Note: we mention some implementation details in the docs (like the fact that we use the gorilla compression) when it serves to showcase the pro's of metrictank, but even there it's pretty abstract

@replay
Copy link
Contributor Author

replay commented Dec 14, 2016

Usually I don't really believe in that there can be "too much information", especially not related to a product which is probably going to be used by quite technical people.

But I'm also fine just skipping it, if they really want to know the docs are in the code ;)

By default all new chunks will be written in the new format. This means
the second byte of the chunk specifies the chunkspan.

We chose to use a lookup table because:
- Using a log2 notation to define the chunk size would optimize for space,
  but it makes us have to deal with awkward numbers.
- Writing down a static multiple of seconds would result in a relatively
  narrow range of chunk lengths that could be represented.

Using the lookup table combines the advantages of the two above
solutions while excluding the disadvantages.
@Dieterbe Dieterbe deleted the save_chunk_length branch December 15, 2017 22:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants