update configs and docs #449

Dieterbe · 2017-01-05T09:23:11Z

No description provided.

Dieterbe · 2017-01-05T10:24:09Z

@woodsaj @replay please have a good look at the commit "add script to make maintaining configs easier". this script embodies the approach I've been using and how I believe we should do it.

Some things left to do:

numchunks has always been described as number of raw chunks to keep in ring buffer. should be at least 1 more than what's needed to satisfy aggregation rules. I don't remember what the 2nd sentence was supposed to mean. can we remove it? can we just use numchunks 1 everywhere, even with long aggregations?
docs/data-knobs.md currently focuses on numchunks, chunkspan settings, and their tradeoffs. I propose we rework this document into a "memory server" document that talks about the two main approaches used: the ringbuffer and the chunk-cache. we should briefly explain both and the pro's and cons of each (which is probably all pro's for chunk-cache and mostly cons for the ringbuffer). we can however still talk about the tradeoffs in tuning numchunks and chunkspan, as another section, and contrasting it to the chunk-cache. @replay what do you think, can you give this a pass?
we have an inconsistent chunkspan setting: most configs use 10min, even though the default is 2h (and that's also what's in the sample config). I want to make it consistent. short (e.g. 10min) is good for experimenting and benchmarking, you'll know soon if cassandra is a problem. if you have a single instance and restart it, data loss is limited (I think it will be common enough for newcomers to run a single instance, restart it, and be surprised when they lose data. with 2h chunkspans data loss would be too crazy)
OTOH longer chunks are a best practice for better compression and resource utilisation. and legit deployments should typically use >10min.
we can't optimize for both I guess. but i'm currently using to defaulting to 10min. your thoughts @woodsaj @replay ?

replay · 2017-01-05T12:16:52Z

I could write a short text in data-knobs.md to mention that the query patterns are very important to determine cache efficiency, what do you think?
In Basic Guideline that recommendation with chunkspan = 20min & numchunks = 1 doesn't match what the text says anymore:

The standard recommendation is 120 points per chunk and keep at least as much in RAM as what your commonly query for (+1 extra chunk, see below)
E.g. if your most common interval is 10s and most of your dashboards query for 2h worth of data, then the recommendation is:

Looks good to me. The scripts/sync-configs.sh script looks useful, probably it's only going to be used by the 3 of us, but good to have it.
I agree that many new users might be annoyed if they lose 2h of data when they restart metrictank. Especially because a large number of the new users would probably use it as a drop-in replacement for graphite/whisper, so they would send the data via the carbon input and then there's no log to replay.

Dieterbe · 2017-01-06T09:49:44Z

In Basic Guideline that recommendation with chunkspan = 20min & numchunks = 1 doesn't match what the text says anymore

because as of now we should be recommending to use the chunk-cache instead of large ringbuffers. a lot of that page needs to be reworked to take into account the chunk-cache.

Dieterbe · 2017-01-06T10:15:14Z

I just realized we can't just use numchunks 1 everywhere because that leaves no margin to save a chunk: on a boundary and shortly after, nodes will keep hitting cassandra looking for chunks that may not be there yet. this reminds me another reason why we had numchunks 5 in the past: should a primary crash or be temporarily unable to do its job, then secondaries can keep serving data up to 5*chunkspan until they start hitting cassandra repeatedly.
So while the ringbuffers are now less effective as a general purpose in-memory cache (the chunk-cache should be better at that), they still serve a purpose at remaining HA when primaries have issues. For this reason i'm going to set it to 5 again everywhere.

Dieterbe · 2017-01-06T13:59:24Z

@woodsaj @replay I just pushed the changes which correspond to the above reasoning.
please review and let me know if any objections or comments. thanks.

replay · 2017-01-06T14:57:50Z

Those are two interesting reasons. I'm not sure I agree that more numchunks is the best solution for them, but I guess for now increasing numchunks kind of removes some pressure off of these two problems that you described, assuming that most queries are querying a range where the oldest ts is not older than the oldest chunk in the ring buffer.

As a better solution I'd suggest this:

If saving a chunk takes more than one second (the smallest granularity of time stamps) then, as you described, metrictanks that get queried for a range which includes the not-yet-saved chunk would keep hitting cassandra.
I can see no reason why we couldn't set numchunks to 2, but persist each chunk at the time it's complete instead of when it gets evicted from the ring buffer. That way we would always have one chunk in the ring buffer -and- in cassandra, and at the time it gets evicted from the ring buffer it has already been in cassandra for (chunkspan - time it takes to save) seconds.
I don't think the purpose of the ring buffer should be to "cover up" HA problems, by doing that we reduce our own flexibility while removing pressure to solve a real problem by covering it up. The solution described at point #1 would help with this too.

Dieterbe · 2017-01-09T14:34:58Z

If saving a chunk takes more than one second

We've seen environments where it takes >=20minutes. Hence I added to the docs ".. Based on your deployment this could take anywhere between milliseconds or many minutes..."

persist each chunk at the time it's complete instead of when it gets evicted from the ring buffer. That way we would always have one chunk in the ring buffer -and- in cassandra, and at the time it gets evicted from the ring buffer it has already been in cassandra for (chunkspan - time it takes to save) seconds

this describes how it is now. This is why we need >1 numchunks to combat the first problem.

I don't think the purpose of the ring buffer should be to "cover up" HA problems,

I don't think we're covering anything up. In my view the ringbuffer is simply the mechanism by which we implement (this particular aspect of) HA. it's tunable through numchunks should that people can make a tradeoff that makes sense for them.

woodsaj · 2017-01-10T16:33:41Z

As soon as chunks are complete we add them to the write queue. However, as all chunks complete at around the same time, the write buffer can take a while to be processed. This is by design so that we dont overwhelm cassandra.
eg.
1million series with agg-settings=10min:6h:2:3mon,1h:6h:2:1y, then every 6hours you will have a burst of 9million writes. At a write throughput of 10k/s it would take 15minutes to write the chunks to cassandra.

If you need to manually fail over a primary then numchunks should be more than the amount of time it takes you to respond to the failure of the primary. That could be anywhere from 5minutes to 8hours depending the the users own response SLA for faults.

In our k8s deployments where we have dedicated read/write nodes we just use numchunks=2

perhaps we should just recommend a numchunks >= 2?

woodsaj · 2017-01-10T16:43:03Z

scripts/sync-configs.sh

+done
+
+echo "updating docs/config.md"
+./scripts/config-to-doc.sh > docs/config.md


This assumes that you are running the script from $GOPATH/src/github.com/raintank/metrictank, which wont always be true. What if a user is in scripts/ and runs ./sync-configs.sh?

We handle this in all other scripts with

# Find the directory we exist within DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd ) cd ${DIR}

Dieterbe · 2017-01-10T17:16:25Z

perhaps we should just recommend a numchunks >= 2?

I think we should recommend something that will give people some time to respond to incidents.
currently this PR introduces a default/recommendation of numchunks 5, which for 10min chunks gives you a timewindow of 40 to 50 minutes. I think this is more reasonable than 10 to 20minutes.
perhaps we should even pick 7 instead, then we can say they have an hour time.

woodsaj · 2017-01-10T17:30:24Z

I dont think we need to get too hung up on the carbon use case as having numchunks >2 is only important if users are replicating metrics to 2 or more MT instances with 1 marked as primary and the others not. I doubt this will be a common deployment model and we should not be encouraging it. If HA is important, users should use kafka.

Dieterbe · 2017-01-11T15:33:38Z

It's not just carbon though? We've been in the situation ourselves a couple times with our worldping infra (which uses kafka): we run a cluster, primary dies, so chunks are not going to cassandra, so the time you have to manually promote a cluster is based on your numchunks. because your nodes can provide gapless responses to render requests as long as they have enough data in the ringbuffer to merge it with what's in cassandra.

replay

Sorry for all the spelling/English pickiness... I figured if I already read through it then I might as well point those out.

replay · 2017-01-11T17:03:33Z

docker/docker-cluster/metrictank.ini

+# 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra
+# 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying.
+# When running a cluster of metrictank instances, all instances should have the same agg-settings.
+# chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md


The bookmarks #valid-chunk-spans could be appended to the link, then we just need to remember to update all the links to it if we ever rename it. On the other hand, we'll have to do that anyway because there already are references to it.

replay · 2017-01-11T17:13:05Z

docker/docker-cluster/metrictank.ini

+retry-interval = 10m
+# max number of concurrent connections to ES
+max-conns = 20
+# max numver of docs to keep in the BulkIndexer buffer


v in number

replay · 2017-01-11T17:13:19Z

docker/docker-cluster/metrictank.ini

+max-conns = 20
+# max numver of docs to keep in the BulkIndexer buffer
+max-buffer-docs = 1000
+# max delay befoer the BulkIndexer flushes its buffer


replay · 2017-01-11T17:14:42Z

docker/docker-cluster/metrictank.ini

-## clustering transports ##
+## basic clustering settings ##
+[cluster]
+# The primary node writes data to cassandra. There should only be 1 primary node per shardGroup.


Some comments end with a . and some don't. I'm fine either way, but maybe consistency would make a better impression.

replay · 2017-01-11T17:20:41Z

docs/memory-server.md

+
+Note:
+* the last (current) chunk is always a "work in progress", so depending on what time it is, it may be anywhere between empty and full.
+* when metrictank starts up, it will not refill the ring buffer with data from Cassandra. They only fill based on data that comes in.  But once data has been seen, the buffer


metrictank is a name so I think it should be upper case.
there are two spaces before the But.

we use metrictank uncapitalized in a bunch of places. but we also use Metrictank in a bunch of places. company-wise we used to treat no-caps as part of our branding (see raintank logo). we haven't really discussed this for metrictank yet.
Now that we're "GrafanaLabs" maybe we should start capitalizing everything ... ?
thoughts @bulletfactory ?

replay · 2017-01-11T17:25:10Z

docs/memory-server.md

+
+#### Warmup and becoming ready for promotion to primary
+
+longer chunk sizes means a longer backfill of more older data (e.g. with kafka oldest offset),


the l in longer should be uppercase because it's the beginning of a sentence

replay · 2017-01-11T17:26:40Z

docs/memory-server.md

+In principle, you need just 1 chunk for each series.
+However:
+* when the data stream moves into a new chunk, secondary nodes would drop the previous chunk and query Cassandra. But the primary needs some time to save the chunk to Cassandra.  Based on your deployment this could take anywhere between milliseconds or many minutes. As you don't want to slam Cassandra with requests at each chunk clear, you should probably use a numchunks of 2, or a numchunks that lets you retain data in memory for however long it takes to flush data to cassandra.
+* The ringbuffers are a great tool to let you deal with crashes or outages of your primary node.  If your primary went down, or for whatever reason cannot save data to Cassandra, then you won't even feel it if the ringbuffers can "clear the gap" between in memory data and older data in cassandra. So we advise to think about how fast your organisation could resolve a potential primary outage, and then set your parameters such that `(numchunks-1) * chunkspan` is more then that.


should be more than instead of more then.

replay · 2017-01-11T17:27:52Z

docs/memory-server.md

+### Configuration examples
+
+E.g. if your most common data interval is 10s, then your chunks should be at least `120*10s=20min` long.
+If you think your organisation will need up to 2 hours to resolve a primary failure, then you need at always at least 6 such chunks in memory,


There's an at too many need at always at

replay · 2017-01-11T17:31:56Z

scripts/sync-configs.sh

+
+echo "first make sure metrictank-sample.ini is up to date. its values should match the defaults used by metrictank. and comments should match the descriptions provided by metrictank help menus"
+echo "now we will run vimdiff to manually synchronize updates from sample config to other configs:"
+echo "try to make every config as closely resembling the sample config as possible, while retaining the customisations that makes each config unique"


no native english speaker, but wouldn't that feel a little more natural:

try to make every config resemble the sample config as closely as possible

and is customisations British spelling? my spell check says it should be customizations

Dieterbe · 2017-01-16T21:56:52Z

@woodsaj any thoughts re #449 (comment) ? i want to make sure we're on the same page re numchunks (in particular, recommending numchunks of 7)

woodsaj · 2017-01-23T18:06:40Z

It's not just carbon though?

Once a PR #485 is merged it will just be carbon that is affected by numChunks. As the recommended topology when using Kafka will be to use dedicated write nodes. With this topology the cluster will self heal after failure without the operator needing to do anything. So numchunks only needs to give enough time for the write node to replay the kafka log. On modest hardware MT can do a few hundred thousand metrics/s, so replaying the backlog wont take long.

Dieterbe · 2017-01-30T11:41:07Z

but you may have a cassandra outage. or a networking problem between MT and cassandra. There's a wide variety of issues that can happen (not just MT itself failing), and that's where numchunks comes in, irrespective of which input plugin you use, you need to have a timeframe to address these sorts of incidents, and it's nice that you can stick a time on how long you have (and make it configurable)

can we agree that there's a valid use case here, and that it makes sense to recommend a sensible numchunks that let's you cover at least an hour worth of whatever issue may appear (e.g. numchunks 7 for chunkspan 10min). I hope we can agree, so that this PR can be merged (i will address the minor points you guys brought up, but first want us to agree on the larger picture described in the doc changes)

woodsaj · 2017-01-30T17:16:11Z

just set numchunks to 7.

but for the record:
NumChunks has no bearing on fault tolerance when you are using kafka.
if MT crashes, you just replay from kafka
if cassandra dies, the chunks will sit in the write queue until it comes back. Chunks can be bumped out of the ring buffer and still remain in the write queue. If MT dies before the writequeue is flushed to Cassandra, then the data will be replayed from kafka.

* numchunks = 1 everywhere, refer to chunk-cache as better method * make sure all configs have the correct chunk-cache, stats and other recent updates.

* standardize on default raw chunkspan 10min and numchunks 5 * improve descriptions

reorganize things better: * a memory-server doc that describes ringbuffer and chunk cache, and then goes into specifics of configuring chunkspan and numchunks. Move the huge list of considerations closer to the setting they apply to. * move compression tips elsewhere

this leaves 60min of data for all series. + make the description of the ringbuffer and chunk cache more nuanced.

Dieterbe · 2017-02-01T16:10:02Z

I think it's very important that we agree on what the docs say. we should all stand behind the recommendations that we make. I think the misunderstanding between me and aj is sufficiently cleared up and I gave the docs another pass, see f0794cf, I think this represents the tradeoff around numchunks and how it complements chunk-cache much better. I also changed the default to numchunks 7. if it turns out to be too wasteful for people they can lower it.
so @replay and @woodsaj if you guys don't mind, could you check out that commit or just give https://github.com/raintank/metrictank/blob/configs/docs/memory-server.md a read through, thanks :)

Dieterbe · 2017-02-07T08:35:24Z

@woodsaj per above comment, can i get a signoff please? thanks :)

Dieterbe force-pushed the configs branch from 62f536e to 51a85cf Compare January 5, 2017 10:09

Dieterbe added this to the hosted-metrics-alpha milestone Jan 5, 2017

Dieterbe requested review from replay and woodsaj January 9, 2017 14:35

woodsaj suggested changes Jan 10, 2017

View reviewed changes

replay reviewed Jan 11, 2017

View reviewed changes

Dieterbe mentioned this pull request Jan 31, 2017

document different deployment strategies/topologies #513

Open

Dieterbe and others added 8 commits February 1, 2017 14:46

clean whitespace

7f99b0b

add script to make maintaining configs easier

b8d4066

sync up configs and defaults

c540470

* numchunks = 1 everywhere, refer to chunk-cache as better method * make sure all configs have the correct chunk-cache, stats and other recent updates.

default to cassandra-idx and favor it over memory-idx. fix #412

d86f6d6

some docs updates

6ff3bf5

fix numchunks, chunkspan defaults and messages

e7f2452

* standardize on default raw chunkspan 10min and numchunks 5 * improve descriptions

update operations guide for new metric names

bc2fe6d

Dieterbe added 2 commits February 1, 2017 15:42

typo's

a0123f4

set default numchunks to 7 + fix description of ringbuffer

f0794cf

this leaves 60min of data for all series. + make the description of the ringbuffer and chunk cache more nuanced.

Dieterbe force-pushed the configs branch from a1fc36f to f0794cf Compare February 1, 2017 16:05

Dieterbe assigned replay and woodsaj Feb 1, 2017

Dieterbe changed the title ~~WIP: update configs and docs~~ update configs and docs Feb 7, 2017

woodsaj approved these changes Feb 7, 2017

View reviewed changes

Dieterbe merged commit e726ea5 into master Feb 7, 2017

Dieterbe unassigned replay Sep 29, 2017

Dieterbe deleted the configs branch September 18, 2018 09:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update configs and docs #449

update configs and docs #449

Dieterbe commented Jan 5, 2017

Dieterbe commented Jan 5, 2017 •

edited

Loading

replay commented Jan 5, 2017 •

edited

Loading

Dieterbe commented Jan 6, 2017

Dieterbe commented Jan 6, 2017

Dieterbe commented Jan 6, 2017

replay commented Jan 6, 2017 •

edited

Loading

Dieterbe commented Jan 9, 2017

woodsaj commented Jan 10, 2017

woodsaj Jan 10, 2017

Dieterbe commented Jan 10, 2017 •

edited

Loading

woodsaj commented Jan 10, 2017

Dieterbe commented Jan 11, 2017

replay left a comment

replay Jan 11, 2017

replay Jan 11, 2017

replay Jan 11, 2017

replay Jan 11, 2017

replay Jan 11, 2017

Dieterbe Jan 11, 2017

replay Jan 11, 2017

replay Jan 11, 2017

replay Jan 11, 2017

replay Jan 11, 2017

Dieterbe commented Jan 16, 2017

woodsaj commented Jan 23, 2017 •

edited

Loading

Dieterbe commented Jan 30, 2017 •

edited

Loading

woodsaj commented Jan 30, 2017

Dieterbe commented Feb 1, 2017 •

edited

Loading

Dieterbe commented Feb 7, 2017


		#### Warmup and becoming ready for promotion to primary

		longer chunk sizes means a longer backfill of more older data (e.g. with kafka oldest offset),

update configs and docs #449

update configs and docs #449

Conversation

Dieterbe commented Jan 5, 2017

Dieterbe commented Jan 5, 2017 • edited Loading

replay commented Jan 5, 2017 • edited Loading

Dieterbe commented Jan 6, 2017

Dieterbe commented Jan 6, 2017

Dieterbe commented Jan 6, 2017

replay commented Jan 6, 2017 • edited Loading

Dieterbe commented Jan 9, 2017

woodsaj commented Jan 10, 2017

Choose a reason for hiding this comment

Dieterbe commented Jan 10, 2017 • edited Loading

woodsaj commented Jan 10, 2017

Dieterbe commented Jan 11, 2017

replay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe commented Jan 16, 2017

woodsaj commented Jan 23, 2017 • edited Loading

Dieterbe commented Jan 30, 2017 • edited Loading

woodsaj commented Jan 30, 2017

Dieterbe commented Feb 1, 2017 • edited Loading

Dieterbe commented Feb 7, 2017

Dieterbe commented Jan 5, 2017 •

edited

Loading

replay commented Jan 5, 2017 •

edited

Loading

replay commented Jan 6, 2017 •

edited

Loading

Dieterbe commented Jan 10, 2017 •

edited

Loading

woodsaj commented Jan 23, 2017 •

edited

Loading

Dieterbe commented Jan 30, 2017 •

edited

Loading

Dieterbe commented Feb 1, 2017 •

edited

Loading