-
Notifications
You must be signed in to change notification settings - Fork 104
update configs and docs #449
Conversation
@woodsaj @replay please have a good look at the commit "add script to make maintaining configs easier". this script embodies the approach I've been using and how I believe we should do it. Some things left to do:
|
|
because as of now we should be recommending to use the chunk-cache instead of large ringbuffers. a lot of that page needs to be reworked to take into account the chunk-cache. |
I just realized we can't just use numchunks 1 everywhere because that leaves no margin to save a chunk: on a boundary and shortly after, nodes will keep hitting cassandra looking for chunks that may not be there yet. this reminds me another reason why we had numchunks 5 in the past: should a primary crash or be temporarily unable to do its job, then secondaries can keep serving data up to 5*chunkspan until they start hitting cassandra repeatedly. |
Those are two interesting reasons. I'm not sure I agree that more As a better solution I'd suggest this:
|
We've seen environments where it takes >=20minutes. Hence I added to the docs ".. Based on your deployment this could take anywhere between milliseconds or many minutes..."
this describes how it is now. This is why we need >1 numchunks to combat the first problem.
I don't think we're covering anything up. In my view the ringbuffer is simply the mechanism by which we implement (this particular aspect of) HA. it's tunable through numchunks should that people can make a tradeoff that makes sense for them. |
As soon as chunks are complete we add them to the write queue. However, as all chunks complete at around the same time, the write buffer can take a while to be processed. This is by design so that we dont overwhelm cassandra. If you need to manually fail over a primary then numchunks should be more than the amount of time it takes you to respond to the failure of the primary. That could be anywhere from 5minutes to 8hours depending the the users own response SLA for faults. In our k8s deployments where we have dedicated read/write nodes we just use numchunks=2 perhaps we should just recommend a numchunks >= 2? |
done | ||
|
||
echo "updating docs/config.md" | ||
./scripts/config-to-doc.sh > docs/config.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that you are running the script from $GOPATH/src/github.com/raintank/metrictank, which wont always be true. What if a user is in scripts/ and runs ./sync-configs.sh?
We handle this in all other scripts with
# Find the directory we exist within
DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
cd ${DIR}
I think we should recommend something that will give people some time to respond to incidents. |
I dont think we need to get too hung up on the carbon use case as having numchunks >2 is only important if users are replicating metrics to 2 or more MT instances with 1 marked as primary and the others not. I doubt this will be a common deployment model and we should not be encouraging it. If HA is important, users should use kafka. |
It's not just carbon though? We've been in the situation ourselves a couple times with our worldping infra (which uses kafka): we run a cluster, primary dies, so chunks are not going to cassandra, so the time you have to manually promote a cluster is based on your numchunks. because your nodes can provide gapless responses to render requests as long as they have enough data in the ringbuffer to merge it with what's in cassandra. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for all the spelling/English pickiness... I figured if I already read through it then I might as well point those out.
docker/docker-cluster/metrictank.ini
Outdated
# 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra | ||
# 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying. | ||
# When running a cluster of metrictank instances, all instances should have the same agg-settings. | ||
# chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bookmarks #valid-chunk-spans
could be appended to the link, then we just need to remember to update all the links to it if we ever rename it. On the other hand, we'll have to do that anyway because there already are references to it.
docker/docker-cluster/metrictank.ini
Outdated
retry-interval = 10m | ||
# max number of concurrent connections to ES | ||
max-conns = 20 | ||
# max numver of docs to keep in the BulkIndexer buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v
in number
docker/docker-cluster/metrictank.ini
Outdated
max-conns = 20 | ||
# max numver of docs to keep in the BulkIndexer buffer | ||
max-buffer-docs = 1000 | ||
# max delay befoer the BulkIndexer flushes its buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
befoer
docker/docker-cluster/metrictank.ini
Outdated
## clustering transports ## | ||
## basic clustering settings ## | ||
[cluster] | ||
# The primary node writes data to cassandra. There should only be 1 primary node per shardGroup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments end with a .
and some don't. I'm fine either way, but maybe consistency would make a better impression.
docs/memory-server.md
Outdated
|
||
Note: | ||
* the last (current) chunk is always a "work in progress", so depending on what time it is, it may be anywhere between empty and full. | ||
* when metrictank starts up, it will not refill the ring buffer with data from Cassandra. They only fill based on data that comes in. But once data has been seen, the buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metrictank
is a name so I think it should be upper case.
there are two spaces before the But
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use metrictank
uncapitalized in a bunch of places. but we also use Metrictank in a bunch of places. company-wise we used to treat no-caps as part of our branding (see raintank
logo). we haven't really discussed this for metrictank yet.
Now that we're "GrafanaLabs" maybe we should start capitalizing everything ... ?
thoughts @bulletfactory ?
docs/memory-server.md
Outdated
|
||
#### Warmup and becoming ready for promotion to primary | ||
|
||
longer chunk sizes means a longer backfill of more older data (e.g. with kafka oldest offset), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the l
in longer
should be uppercase because it's the beginning of a sentence
docs/memory-server.md
Outdated
In principle, you need just 1 chunk for each series. | ||
However: | ||
* when the data stream moves into a new chunk, secondary nodes would drop the previous chunk and query Cassandra. But the primary needs some time to save the chunk to Cassandra. Based on your deployment this could take anywhere between milliseconds or many minutes. As you don't want to slam Cassandra with requests at each chunk clear, you should probably use a numchunks of 2, or a numchunks that lets you retain data in memory for however long it takes to flush data to cassandra. | ||
* The ringbuffers are a great tool to let you deal with crashes or outages of your primary node. If your primary went down, or for whatever reason cannot save data to Cassandra, then you won't even feel it if the ringbuffers can "clear the gap" between in memory data and older data in cassandra. So we advise to think about how fast your organisation could resolve a potential primary outage, and then set your parameters such that `(numchunks-1) * chunkspan` is more then that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be more than
instead of more then
.
docs/memory-server.md
Outdated
### Configuration examples | ||
|
||
E.g. if your most common data interval is 10s, then your chunks should be at least `120*10s=20min` long. | ||
If you think your organisation will need up to 2 hours to resolve a primary failure, then you need at always at least 6 such chunks in memory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an at
too many need at always at
scripts/sync-configs.sh
Outdated
|
||
echo "first make sure metrictank-sample.ini is up to date. its values should match the defaults used by metrictank. and comments should match the descriptions provided by metrictank help menus" | ||
echo "now we will run vimdiff to manually synchronize updates from sample config to other configs:" | ||
echo "try to make every config as closely resembling the sample config as possible, while retaining the customisations that makes each config unique" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no native english speaker, but wouldn't that feel a little more natural:
try to make every config resemble the sample config as closely as possible
and is customisations
British spelling? my spell check says it should be customizations
@woodsaj any thoughts re #449 (comment) ? i want to make sure we're on the same page re numchunks (in particular, recommending numchunks of 7) |
Once a PR #485 is merged it will just be carbon that is affected by numChunks. As the recommended topology when using Kafka will be to use dedicated write nodes. With this topology the cluster will self heal after failure without the operator needing to do anything. So numchunks only needs to give enough time for the write node to replay the kafka log. On modest hardware MT can do a few hundred thousand metrics/s, so replaying the backlog wont take long. |
but you may have a cassandra outage. or a networking problem between MT and cassandra. There's a wide variety of issues that can happen (not just MT itself failing), and that's where numchunks comes in, irrespective of which input plugin you use, you need to have a timeframe to address these sorts of incidents, and it's nice that you can stick a time on how long you have (and make it configurable) can we agree that there's a valid use case here, and that it makes sense to recommend a sensible numchunks that let's you cover at least an hour worth of whatever issue may appear (e.g. numchunks 7 for chunkspan 10min). I hope we can agree, so that this PR can be merged (i will address the minor points you guys brought up, but first want us to agree on the larger picture described in the doc changes) |
just set numchunks to 7. but for the record: |
* numchunks = 1 everywhere, refer to chunk-cache as better method * make sure all configs have the correct chunk-cache, stats and other recent updates.
* standardize on default raw chunkspan 10min and numchunks 5 * improve descriptions
reorganize things better: * a memory-server doc that describes ringbuffer and chunk cache, and then goes into specifics of configuring chunkspan and numchunks. Move the huge list of considerations closer to the setting they apply to. * move compression tips elsewhere
this leaves 60min of data for all series. + make the description of the ringbuffer and chunk cache more nuanced.
I think it's very important that we agree on what the docs say. we should all stand behind the recommendations that we make. I think the misunderstanding between me and aj is sufficiently cleared up and I gave the docs another pass, see f0794cf, I think this represents the tradeoff around numchunks and how it complements chunk-cache much better. I also changed the default to numchunks 7. if it turns out to be too wasteful for people they can lower it. |
@woodsaj per above comment, can i get a signoff please? thanks :) |
No description provided.