breaking changes

tag index

MT now has an experimental tag-index built-in (compatible with graphite and we also aim to integrate with prometheus). This comes with an internal schema update. see #729 #749 , #750 , #755 , #759 , #762 , #774 , #779
for this we introduced two new config flags:

tag-support : whether to enable tag queries
match-cache-size : internal, can be mostly ignored.

note : tags in MetricDefinitions and MetricData are now validated for correctness irrespective of tag-support configuration. for invalid incoming metrics, you will see in: Invalid metric debug log messages and incrementing of input.*.metric_invalid metrics. they are no longer ingested.

metric names are now extended with tags, in the memory index, and in all query api output, potentially breaking dashboards (not in the persisted index)

cassandra

omit read requests when they are too old and when the read queue is full

(instead of the previous blocking behavior) #685
config:

cassandra-read-queue-size now defaults to 200000 instead of 100. update your value otherwise you may see read requests dropped too eagerly
cassandra-omit-read-timeout (new) setting defaults to 60s

new metrics:

store.cassandra.omitted_old_reads
store.cassandra.read_queue_full

remove cassandra index pruning

until we figure out a better mechanism. disk usage may grow more if you have heavy churn #765 #800 #816

swim (memberlist) settings

we now allow tweaking many swim settings via a new swim config section, and also the bind-addr property has moved from cluster section to swim section
see #760 in particular ae3c5da and 81eea5f
also relevant is the new gossip-to-the-dead-time setting which can help with recovering from split brains.

non-breaking changes

stats, logging, instrumentation, profiling

add opentracing instrumentation using jaeger #709 , #713 , #715, #758 ,
3057525, 736d2db, #732, ea9f56f
fix the cache bug oppressed stats #721
consistently report any recoverable runtime faults via metrics and logs. metrictank.stats.$environment.$instance.recovered_errors.*.*.* 760bd06
better mutex and block pprof endpoints #737
report accurate mean instead of an approximation #744

input plugins

carbon : like graphite, strip leading "." if any from metric key #694 . this fixes an index crash bug #668

index

find improvements #655
stop index pruning from locking the index during the entire operation, which was slowing down requests. #787
fix CassandraIdx.Init error handling 31990fb

tools

Whisper importer aggregate conversion and various improvements and fixes #712 , #720 , #743, #752, #793, #814
mt-replicator-via-tsdb : use the kafkaMdm input plugin to consume from kafka + various clustering changes related to it #723
mt-index-cat tags valid filter #817
mt-kafka-mdm-sniff-out-of-order improvements #754
mt-index-cat functions for showing age and rounding of durations #763
deprecate old mt-index tools and remove mt-replicator because it's not reliable. #783

http API

proper statuscode when render failed #718
add maxSeries #742
refactor aggregation function api #771
Support for groupByTags and aliasByTags #780
remove from adjustment for clustered requests, for more consistent output #767 and to fix this bug: "include old metricdefs up to 24h" prevents from new higher res data to become visible #380
properly propagate request cancellations through the cluster and cancel work-in progress. #728
graphite-compatible msgpack support #789
proxy /functions to graphite #815

dashboard

make the dashboard multi-instance capable + separate plots for partition lag vs persist partition lag #722
update dashboard 8f356d7

storage & chunk cache

make cassandra schema optional via cassandra-create-keyspace flag, useful when provisioning clusters. e9ad4d8 , e9ad4d8, ac401e7, fa88502, 8311277, d16ca0f
out of order chunks in chunkCache leading to all kinds of trouble #733
fix a config bug that was causing reorderbuffer not to activate #756
apply cassandra-timeout setting to insert queries #778
Clear cache api #555
make reorderbuffer and garbage collection work better together. #781 , fixes crash bug #776
update gocql a04083f #778

clustering

various cluster readyness / priority / initialisation fixes + new cluster dashboard #717
update memberlist to post v0.1.0 7f40597
chaos testing #760

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Happy 2018!