Happy 2018!
breaking changes
tag index
MT now has an experimental tag-index built-in (compatible with graphite and we also aim to integrate with prometheus). This comes with an internal schema update. see #729 #749 , #750 , #755 , #759 , #762 , #774 , #779
for this we introduced two new config flags:
tag-support
: whether to enable tag queriesmatch-cache-size
: internal, can be mostly ignored.
note : tags in MetricDefinitions and MetricData are now validated for correctness irrespective of tag-support configuration. for invalid incoming metrics, you will see in: Invalid metric
debug log messages and incrementing of input.*.metric_invalid
metrics. they are no longer ingested.
metric names are now extended with tags, in the memory index, and in all query api output, potentially breaking dashboards (not in the persisted index)
cassandra
omit read requests when they are too old and when the read queue is full
(instead of the previous blocking behavior) #685
config:
cassandra-read-queue-size
now defaults to 200000 instead of 100. update your value otherwise you may see read requests dropped too eagerlycassandra-omit-read-timeout
(new) setting defaults to 60s
new metrics:
store.cassandra.omitted_old_reads
store.cassandra.read_queue_full
remove cassandra index pruning
until we figure out a better mechanism. disk usage may grow more if you have heavy churn #765 #800 #816
swim (memberlist) settings
we now allow tweaking many swim settings via a new swim config section, and also the bind-addr property has moved from cluster section to swim section
see #760 in particular ae3c5da and 81eea5f
also relevant is the new gossip-to-the-dead-time
setting which can help with recovering from split brains.
non-breaking changes
stats, logging, instrumentation, profiling
- add opentracing instrumentation using jaeger #709 , #713 , #715, #758 ,
3057525, 736d2db, #732, ea9f56f - fix the cache bug oppressed stats #721
- consistently report any recoverable runtime faults via metrics and logs.
metrictank.stats.$environment.$instance.recovered_errors.*.*.*
760bd06 - better mutex and block pprof endpoints #737
- report accurate mean instead of an approximation #744
input plugins
- carbon : like graphite, strip leading "." if any from metric key #694 . this fixes an index crash bug #668
index
- find improvements #655
- stop index pruning from locking the index during the entire operation, which was slowing down requests. #787
- fix CassandraIdx.Init error handling 31990fb
tools
- Whisper importer aggregate conversion and various improvements and fixes #712 , #720 , #743, #752, #793, #814
- mt-replicator-via-tsdb : use the kafkaMdm input plugin to consume from kafka + various clustering changes related to it #723
- mt-index-cat tags valid filter #817
- mt-kafka-mdm-sniff-out-of-order improvements #754
- mt-index-cat functions for showing age and rounding of durations #763
- deprecate old mt-index tools and remove mt-replicator because it's not reliable. #783
http API
- proper statuscode when render failed #718
- add
maxSeries
#742 - refactor aggregation function api #771
- Support for
groupByTags
andaliasByTags
#780 - remove from adjustment for clustered requests, for more consistent output #767 and to fix this bug: "include old metricdefs up to 24h" prevents from new higher res data to become visible #380
- properly propagate request cancellations through the cluster and cancel work-in progress. #728
- graphite-compatible msgpack support #789
- proxy /functions to graphite #815
dashboard
- make the dashboard multi-instance capable + separate plots for partition lag vs persist partition lag #722
- update dashboard 8f356d7
storage & chunk cache
- make cassandra schema optional via
cassandra-create-keyspace
flag, useful when provisioning clusters. e9ad4d8 , e9ad4d8, ac401e7, fa88502, 8311277, d16ca0f - out of order chunks in chunkCache leading to all kinds of trouble #733
- fix a config bug that was causing reorderbuffer not to activate #756
- apply cassandra-timeout setting to insert queries #778
- Clear cache api #555
- make reorderbuffer and garbage collection work better together. #781 , fixes crash bug #776
- update gocql a04083f #778
clustering
- various cluster readyness / priority / initialisation fixes + new cluster dashboard #717
- update memberlist to post v0.1.0 7f40597
- chaos testing #760
meta
- rebrand: raintank -> GrafanaLabs and repository move #738
- contribution guidelines #740
- circleci build optimizations and tweaks. f2df73a, e994819, #757, e833430
- use dep instead of govendor 28c02a7 (via #760)
- builds: go1.8 -> 1.9 #712
- update dependencies leveldb #785, globalconf #786
- developer documentation #805
- reorganize the scripts, add various code quality and linting steps to CI, upgrade to circleci 2.0 #803