distribute metrics across tables of varying compaction windows #484

replay · 2017-01-19T16:15:33Z

fixes #382

woodsaj · 2017-01-19T17:27:17Z

mdata/store_cassandra.go

+		table_name := fmt.Sprintf("metric_%d", window)
+
+		// table isn't know to exist, ensure it exists
+		err := c.session.Query(fmt.Sprintf(table_schema, keyspace, table_name, window)).Exec()


The window size and the table suffix need to be different.

The general rule of thumb with Cassandra is to try and limit tables to 10s of SSTABLES. I think setting the window appropriately to get no more then 20sstables is a good fit for us.

window_size := (window/20) +1
Once the data is compacted into our "window_size" sstable, it is never compacted again (this means tombstoned data is not deleted). So by having up to 20 sstables, we only incur a 5-10% overhead before data past its ttl is deleted, which is nice.

Actually we can only guess how large the time range of most queries is, it doesn't necessarily need to be tied to the TTL. So I was also considering to add some option like compaction-window-factor and the I'd just:

var window int = 1 for tmp := ttl / (24 * 60 * 60); tmp > 1; tmp = tmp / 2 { window *= 2 } window = window / compactionWindowFactor

Because in our case one ratio between TTL and compaction window might make sense, but if somebody is mostly interested in archiving until forever but they always just query the last day then our assumption might be sub optimal for them.

In your comment you mean this, right? window_size := (ttl/20) +1

@woodsaj what do you think about adding the compation window factor option?

why does it have to be a loop? can't you make it a formula? that's easier to reason about then stepping through it in your head.

in the table in your comment there are multiple tables of the window size 1day. Wouldn't it be better to put them together and use the window size as suffix of the table name?

I dont think so. We wouldnt want 31day ttls and 2day ttls in the same sstable as it impacts delete performance.

However, for these smaller TTLs we should probably use a window size of less then a day.

Just to be sure I understand, with delete performance you mean stuff that has been deleted would be kept around unnecessarily for too long, right?
Ok, I can change everything to use hours instead of days.

with TWCS Cassnadra only deletes data when all data in an SSTABLE has reached its TTL.

http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html has a really great write up of TWCS, i suggest you read it.

so the proposed compaction-window-factor is an operator-tunable knob so that the operator can make a tradeoff between ~20sstables and "many more" sstables, if their queries are mostly for small ranges. The assumption here is that it's inefficient or suboptimal to read small subsets of data out of a single large sstable.
But is this assumption really true?
I just spent some time trying to get a better understanding of the impact of such a setup.
The recommendation to have fewer larger sstables seems almost universal, though I still haven't gonne to the bottom of it.
In particular I found the "memtable" and "sstable" section of
http://manuel.kiessling.net/2016/07/11/how-cassandras-inner-workings-relate-to-performance/ quite interesting. In particular:

As said, the structure of an SSTable is optimized for efficient reads from disk – entries in one SSTable are sorted by partition key, and an index of all partition keys is loaded into memory when an SSTable is openend. Looking up a row therefore is only one disk seek, with further sequential reads for retrieving the actual row data – thus, no expensive random I/O needs to be performed.

So the larger the sstable, the larger the index it has to load into memory (are these cached by cassandra?), and it can find the row for the given partition (metric key) very fast.
However since we use wide rows it still has to seek to the right ranges of columns that fall within the requested time range. I'm not sure exactly how that works. we use ts as clustering key.

Anyway this needs some more looking into, but I would be cautious introducing a config option if none of us anticipates using it.

That's a really interesting discussion. I think for me the reason why I decided to add the option was simple:

I'm pretty sure we will never need to change that to another value than the default 20 in our use case, but since this available online other people who have different use cases might use it. Can we be sure that there is not a single possible use case in which it makes sense to tune that value? I don't think so. So I added it because it's not much additional overhead anyway.

Dieterbe · 2017-01-23T09:40:01Z

api/dataprocessor.go

@@ -542,6 +543,7 @@ type requestContext struct {
 	To     uint32                     // may be different than user request, see below
 	Key    string                     // key to query
 	AggKey string                     // aggkey to query (if needed)
+	Ttl    uint32


https://github.com/golang/go/wiki/CodeReviewComments#initialisms

Dieterbe · 2017-01-23T09:41:48Z

mdata/ifaces.go

@@ -13,4 +13,5 @@ type Metric interface {
 	Add(ts uint32, val float64)
 	Get(from, to uint32) (uint32, []chunk.Iter)
 	GetAggregated(consolidator consolidation.Consolidator, aggSpan, from, to uint32) (uint32, []chunk.Iter)
+	GetTtl() uint32


https://github.com/golang/go/wiki/CodeReviewComments#initialisms

Dieterbe · 2017-01-23T11:00:09Z

mdata/store_cassandra.go

+		 * |         512 - 1023 |        metrics_512 |                26 |
+		 * |        1024 - 2047 |       metrics_1024 |                52 |
+		 * |--------------------|--------------------|-------------------|
+		 *


this table can be better by:

putting the unit in the TTL range and window size column (I think both are in hours). in fact for the larger hours, can we also put the number in days (1024 is ~43 days, 2047 is ~85)

adding a number of sstables range column, e.g. for TTL range 64 - 127 this would contain 16 - 55

going up to about 5 years (it will be common i suspect to have ttls of a few years, at least much more than 2047 hours. should be only about 5 new rows in the table to add)

note also that all ttl's (see ttl option and the ttl within agg-settings) can currently be any non-zero number of seconds.
either we clarify this table to show what happens when your ttl is for example 3.5 hours (we simply round down before looking at the range) or we could just add a restriction that it has to be an even number of hours.

instead of talking about what happens if the ttl is 3.5 hours i could simply make ttlUnits return a float64 instead of a uint32, right now the returned value is converted into a float64 right after it is returned anyway, then no clarification should be necessary.

Dieterbe · 2017-01-23T11:14:27Z

mdata/store_cassandra.go

+		// calculate the window size by finding the largest power of 2 that's smaller than ttl
+		window := uint32(math.Exp2(math.Floor(math.Log2(float64(ttlUnits(ttl))))))
+		tableName := fmt.Sprintf(nameFormat, window)
+		result := tables[ttl]


no need to lookup from the table. just add the (overwrite any pre-existing) entry?

Yeah I just did that because the alternative would be pretty clunky, because that's an anonymous struct:

tables[ttl] = struct { table string window uint32 }{ table: tableName, window: window/uint32(windowFactor) + 1, }

I guess I could also just define a type for that.

Yeah I guess that's nicer indeed: 96f3bf6

Dieterbe · 2017-01-23T11:16:54Z

mdata/store_cassandra.go

+		 *
+		 */
+
+		// calculate the window size by finding the largest power of 2 that's smaller than ttl


here you're talking about "window size" but below you're calling it "window"

Dieterbe · 2017-01-23T11:21:09Z

mdata/store_cassandra.go

+		tableName := fmt.Sprintf(nameFormat, window)
+		result := tables[ttl]
+		result.table = tableName
+		result.window = window/uint32(windowFactor) + 1


this is confusing. higher up window meant the lowest power of 2 (e.g. 2014), here you're using it as the window size. you're using "window" twice with different meanings.
my suggestion: use roundedTTL instead of window above, and just 'window' to talk about the compaction window.

Dieterbe · 2017-01-23T11:25:50Z

metrictank.go

@@ -237,6 +238,8 @@ func main() {
 	set := strings.Split(*aggSettings, ",")
 	finalSettings := make([]mdata.AggSetting, 0)
 	highestChunkSpan := chunkSpan
+	ttls := make([]uint32, 1)
+	ttls[0] = ttl


both of these lines can be replaced by ttls := []uint32{ttl}

Dieterbe

see comments

replay · 2017-01-25T17:03:59Z

I think this is ready for review again once you have some time

Dieterbe · 2017-01-30T09:06:48Z

api/dataprocessor.go

@@ -466,7 +467,7 @@ func (s *Server) getSeriesCachedStore(ctx *requestContext, until uint32) []chunk
 	// the request cannot completely be served from cache, it will require cassandra involvement
 	if !cacheRes.Complete {
 		if cacheRes.From != cacheRes.Until {
-			storeIterGens, err := s.BackendStore.Search(key, cacheRes.From, cacheRes.Until)
+			storeIterGens, err := s.BackendStore.Search(key, ctx.TTL, cacheRes.From, cacheRes.Until)


when the TTL was assigned to the context, it was that of the raw metric. but we may be requesting a rolled up series which may have a different ttl.

Right, good catch. I think that means I'll need to modify models.Req and add the TTL there. Then that TTL attribute cat get set in alignRequests() when the aggregation is selected.

I don't like introducing new dependencies. But I think before doing that I should wait for this #509 to get merged, because of the modifications it's making in alignRequests() which would be useful here.

Dieterbe · 2017-01-30T09:08:02Z

cmd/mt-split-metrics-by-ttl/main.go

+
+func main() {
+	flag.Usage = func() {
+		fmt.Fprintln(os.Stderr, "mt-create-metrics-by-ttl [flags] ttl [ttl...]")


split not create

Dieterbe · 2017-01-30T09:10:01Z

cmd/mt-split-metrics-by-ttl/main.go

+	tables := store.GetTableNames()
+
+	// create directory/link structure that we need to define the future table names
+	err = os.Mkdir(fmt.Sprintf("%s/%s", tmpDir, *cassandraKeyspace), 0700)


https://golang.org/pkg/path/#Join

Dieterbe · 2017-01-30T09:15:59Z

cmd/mt-split-metrics-by-ttl/main.go

+		}
+	}
+
+	fmt.Println(fmt.Sprintf("The following tables have been created: %s", strings.Join(tables, ", ")))


can just call fmt.Printf

Dieterbe · 2017-01-30T09:19:32Z

cmd/mt-split-metrics-by-ttl/main.go

+	fmt.Println("- Stop metrictanks to stop writes")
+	fmt.Println(fmt.Sprintf("- Use `nodetool snapshot --table metric %s` on all cluster nodes to create snapshot of metric table", *cassandraKeyspace))
+	fmt.Println("  https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsSnapShot.html")
+	fmt.Println(fmt.Sprintf("- Move snapshot files to directory %s", snapshotDir))


we're asking people to log into all cassandra nodes and create snapshots, and then copy those snapshots over the network to the machine we ran this program on? that seems very inefficient, and this machine may not even have enough space. am i missing something? can't we call sstableloader from the machines that hold the snapshots? that would probably keep the data local to the node too. if you move the snapshots to another node and then load them there, that node will probably have to send the data to the other cluster nodes (including the one it came from) anyway; since the partitioning will be the same i think

No, the instructions are actually for a single node cassandra. In a cluster this directory structure would be be created/copied to every node in the cluster and then these steps are done on each of the nodes. I guess the output should mention that, but in the end it will always require quite some manual interaction by the user.

the instructions here say to run nodetool snapshot on all cluster nodes and move the files to the directory on the system where this tool was run. (at least that's how it comes across)

to clarify it to users, perhaps the help message of this tool should say "this tool, and all of its instructions, should be run on all cluster nodes" and then don't instruct people here to go run commands on other nodes.

I updated it. Actually not everything the tool does needs to be run on each node, because the table schema only needs to be created once. But the message is now better at explaining what needs to be done on each node.

Dieterbe · 2017-01-30T09:21:07Z

cmd/mt-store-cat/main.go

@@ -204,12 +205,20 @@ func main() {



3 should be 4 now

Dieterbe · 2017-01-30T09:25:34Z

metrictank has configurable windowfactor, but mt-store-cat and mt-split-metrics-by-ttl have it hardcoded to 20? I'm fine with keeping it configurable but then it should also be configurable in the tools.

Dieterbe · 2017-01-30T09:31:25Z

mdata/store_cassandra.go

+		 * generated with: https://gist.github.com/replay/69ad7cfd523edfa552cd12851fa74c58
+		 *
+		 * +-----------------------------+---------------+-------------+----------+
+		 * |                hours / days |    table_name | window_size | sstables |


window_size should mention what unit it is in.

Dieterbe · 2017-01-30T09:53:02Z

mdata/store_cassandra.go

+		 * +-----------------------------+---------------+-------------+----------+
+		 * |                hours / days |    table_name | window_size | sstables |
+		 * +-----------------------------+---------------+-------------+----------+
+		 * |                      1 / <1 |     metrics_1 |           1 |    1 - 1 |


I still find this confusing because we don't say anything about various values in between. I suggest we use a range syntax like so:

ttl table_name ------------------------------ hours < 1 metrics_0 1 <= hours < 2 metrics_1 2 <= hours < 4 metrics_2 4 <= hours < 8 metrics_4

(the amount of days can't be specified this way, it's ok for that to be an approximation. maybe just turn that into a days (approx) column)

I'm not sure what you mean with the values in between. I don't think 1 <= hours < 2 is easier to understand than 1 - 2 if the column is named ttl hours. Do you mean the values between the limits of the different rows? Actually I think an approximation is good enough in this case, otherwise it just gets unnecessarily complicated. How about this:

mst@mst-nb1:~$ python ./generate_window_size_comment_table.py +-------------------+---------------+-------------+----------+ | TTL hours (floor) | table_name | window_size | sstables | +-------------------+---------------+-------------+----------+ | 1 | metrics_1 | 1 | 1 - 1 | | 2 - 3 | metrics_2 | 1 | 2 - 3 | | 4 - 7 | metrics_4 | 1 | 4 - 7 | | 8 - 15 | metrics_8 | 1 | 8 - 15 | | 16 - 31 | metrics_16 | 1 | 16 - 31 | | 32 - 63 | metrics_32 | 2 | 16 - 31 | | 64 - 127 | metrics_64 | 4 | 16 - 31 | | 128 - 255 | metrics_128 | 7 | 18 - 36 | | 256 - 511 | metrics_256 | 13 | 19 - 39 | | 512 - 1023 | metrics_512 | 26 | 19 - 39 | | 1024 - 2047 | metrics_1024 | 52 | 19 - 39 | | 2048 - 4095 | metrics_2048 | 103 | 19 - 39 | | 4096 - 8191 | metrics_4096 | 205 | 19 - 39 | | 8192 - 16383 | metrics_8192 | 410 | 19 - 39 | | 16384 - 32767 | metrics_16384 | 820 | 19 - 39 | | 32768 - 65535 | metrics_32768 | 1639 | 19 - 39 | +-------------------+---------------+-------------+----------+

I removed the days again because if the days are also floored it gets confusing because sometimes there are overlaps between the rows and sometimes there aren't. I'm sure everybody knows a day has 24h.

Do you mean the values between the limits of the different rows?

yes. I think your suggested table is basically the same as my suggestion, but I still see two slight advantages to my proposal:

seeing floor in the header and figuring out how that relates to the values shown (take your ttl, floor it, only then look at the ranges in the table) is not immediately obvious. though you could clarify that with a comment instructing people how to use the table, but I think my version is clearer more quickly.

my values are simply all the powers of 2, and they relate directly to the table names in the 2nd column. Values like 31 and 1023 look slightly weird.

Actually I think an approximation is good enough in this case, otherwise it just gets unnecessarily complicated

it seems trivial to me to do the right thing here, no need for approximations. your version isn't an approximation either (which is good) so I don't understand what you're getting at here.

Ok, nevermind. I just thought the x <= hours <y notation is confusing.
How is that:

+------------------------+---------------+---------------------+----------+ | TTL hours | table_name | window_size (hours) | sstables | +------------------------+---------------+---------------------+----------+ | 1 <= hours < 2 | metrics_1 | 1 | 1 - 3 | | 2 <= hours < 4 | metrics_2 | 1 | 2 - 5 | | 4 <= hours < 8 | metrics_4 | 1 | 4 - 9 | | 8 <= hours < 16 | metrics_8 | 1 | 8 - 17 | | 16 <= hours < 32 | metrics_16 | 1 | 16 - 33 | | 32 <= hours < 64 | metrics_32 | 2 | 16 - 33 | | 64 <= hours < 128 | metrics_64 | 4 | 16 - 33 | | 128 <= hours < 256 | metrics_128 | 7 | 19 - 38 | | 256 <= hours < 512 | metrics_256 | 13 | 20 - 41 | | 512 <= hours < 1024 | metrics_512 | 26 | 20 - 41 | | 1024 <= hours < 2048 | metrics_1024 | 52 | 20 - 41 | | 2048 <= hours < 4096 | metrics_2048 | 103 | 20 - 41 | | 4096 <= hours < 8192 | metrics_4096 | 205 | 20 - 41 | | 8192 <= hours < 16384 | metrics_8192 | 410 | 20 - 41 | | 16384 <= hours < 32768 | metrics_16384 | 820 | 20 - 41 | | 32768 <= hours < 65536 | metrics_32768 | 1639 | 20 - 41 | +------------------------+---------------+---------------------+----------+

looks 👍 but I think the sstable counts are still off.
for example 65535 / 1639 = 39.9847, so how could it be 41 sstables, should be 40 I think?

if the time span during which a datapoint needs to be kept can be 39.9847 tables, and the ts of a datapoint is in the middle of the table, then the end of live of that datapoint will be in the middle of the 41st table. So that comes from the fact that the ts doesn't need to be aligned with the table wrapping times.

that makes a lot of sense. but i see that your latest push has the upper limit at (the incorrect) 40 now ? :/ and also the table is in the old format again?

ah right, i think i only updated the script to generate the table and didn't copy paste the output yet. updating now

Dieterbe · 2017-02-01T10:39:30Z

metrictank.go

@@ -301,7 +304,7 @@ func main() {
 	/***********************************
 		Initialize our backendStore
 	***********************************/
-	store, err := mdata.NewCassandraStore(*cassandraAddrs, *cassandraKeyspace, *cassandraConsistency, *cassandraCaPath, *cassandraUsername, *cassandraPassword, *cassandraHostSelectionPolicy, *cassandraTimeout, *cassandraReadConcurrency, *cassandraWriteConcurrency, *cassandraReadQueueSize, *cassandraWriteQueueSize, *cassandraRetries, *cqlProtocolVersion, *cassandraSSL, *cassandraAuth, *cassandraHostVerification)
+	store, err := mdata.NewCassandraStore(*cassandraAddrs, *cassandraKeyspace, *cassandraConsistency, *cassandraCaPath, *cassandraUsername, *cassandraPassword, *cassandraHostSelectionPolicy, *cassandraTimeout, *cassandraReadConcurrency, *cassandraWriteConcurrency, *cassandraReadQueueSize, *cassandraWriteQueueSize, *cassandraRetries, *cqlProtocolVersion, *cassandraWindowFactor, *cassandraSSL, *cassandraAuth, *cassandraHostVerification, ttls)


the ttl values we're passing to Cassandra here are a number of seconds.
cassandra expects a number of hours.

I think we should do one of these two:
A) document that CassandraStore needs ttl in hours, and adjust the invocation here
B) make cassandraStore accept TTL as number of seconds, and adjust the other invocations.

I prefer B because a number of hours is awkward (at the very least surprising) compared to all the other internal api's which deal with numbers of seconds.

I'm trying to understand why you're saying that cassandra expects the number to be in hours.

When the getTTLTables() generates the table map it is converting those ttls from seconds to hours.
In the INSERT statement, according to the docs, it should always expect the TTL to be in seconds no matter what the compaction_window_unit is.
So where does cassandra expect hours?

sorry I meant NewCassandraStore takes the ttl argument as number of hours, which is what I think we should change to seconds.

I don't think that's the case.
The ttls variable in NewCassandraStore only gets used once, as parameter to GetTTLTables(), which returns a map of the type ttlTables where the tables are indexed by ttl in seconds.
GetTTLTables() only temporarily converts the ttl by using ttlUnits() to calculate the right windowSize because compaction_window_unit is HOURS, but apart from that it's all in seconds.

oh, i get where the confusion comes from... the mt-split-metrics-by-ttl wrongly treats the TTL as hours, i'll fix that

This should clear up the mess: da4a73e

Dieterbe · 2017-02-01T10:43:12Z

cmd/mt-split-metrics-by-ttl/main.go

+	if err != nil {
+		panic(fmt.Sprintf("Failed to create directory: %s", err))
+	}
+	namedTableLinks := make([]string, 0)


can make correctly sized slice and then just use [i] syntax to set elements instead of append ;)

Dieterbe · 2017-02-01T10:45:39Z

cmd/mt-split-metrics-by-ttl/main.go

+	fmt.Printf("The following tables have been created: %s\n", strings.Join(tables, ", "))
+	fmt.Println("Now continue with the following steps on each cassandra node. In the case")
+	fmt.Printf("of a cluster setup you may want to copy the directory structure in %s\n", tmpDir)
+	fmt.Println("to each of the nodes.")


we tell people to do the following steps and also to copy the dir structure. but we're not clear on the order of what should happen first (I think first copy dir structure, then do next steps?)

also we say "do this on on each cassandra node.. and then "in the case of a cluster". isn't having multiple nodes and a cluster the same thing?
So I think we could simply say.

On each of your cassandra nodes:

recreate this directory structure (by copying it, or running this tool)

execute these steps

Ok, how is that:

mst@ubuntu:~/go/src/github.com/raintank/metrictank/cmd/mt-split-metrics-by-ttl$ ./mt-split-metrics-by-ttl 36000 The following tables have been created: metric_8 Now continue with the following steps on your cassandra node(s): - Recreate (or copy) this directory structure on each cassandra node: /tmp/metrictank859670692 - Stop metrictanks to stop writes - Use `nodetool snapshot --table metric metrictank` on all cluster nodes to create a snapshot of the metric table https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsSnapShot.html - Move snapshot files to directory /tmp/metrictank859670692/snapshot - Restore by executing: sstableloader -d localhost /tmp/metrictank859670692/metrictank/metric_8

do we need the "Move snapshot files to directory /tmp/metrictank859670692/snapshot" step, can't we just tell them to cd into that dir before running the nodetool snapshot?

also to stop writes, you could stop metrictank or set it to secondary status, see https://github.com/raintank/metrictank/blob/master/docs/http-api.md#set-cluster-status

nodetool snapshot won't store the snapshot in the cwd, it puts them in the data directory of cassandra.
Good point with making it secondary

Latest version:

The following tables have been created: metric_8 Now continue with the following steps on your cassandra node(s): - Recreate (or copy) this directory structure on each cassandra node: /tmp/metrictank185098310 - Stop writes by stopping the metrictanks or set their cluster status to secondary - Use `nodetool snapshot --table metric metrictank` on all cluster nodes to create a snapshot of the metric table https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsSnapShot.html - Move snapshot files to directory /tmp/metrictank185098310/snapshot - Load the data by executing: sstableloader -d localhost /tmp/metrictank185098310/metrictank/metric_8

Dieterbe · 2017-02-01T10:55:43Z

cmd/mt-split-metrics-by-ttl/main.go

+	fmt.Println("  a snapshot of the metric table")
+	fmt.Println("  https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsSnapShot.html")
+	fmt.Println(fmt.Sprintf("- Move snapshot files to directory %s", snapshotDir))
+	fmt.Println("- Restore by executing:")


"restore" isn't the most fitting word here. Perhaps "Execute this to assure all new ttl tables will have the data for metrics with the right ttl (and also data for other series which won't be used, but that's ok)"

How about "Load the data" instead of "restore"? That's short and reflects what's happening, if they want to know more details there is already a link to the docs

Dieterbe · 2017-02-01T11:54:22Z

api/dataprocessor_test.go

@@ -611,7 +611,10 @@ func TestAlignRequests(t *testing.T) {
 				reqRaw("a", 0, 30, 800, 10, consolidation.Avg),
 				reqRaw("b", 0, 30, 800, 60, consolidation.Avg),
 			},
-			[]mdata.AggSetting{},
+			mdata.AggSettings{
+				35 * 24 * 60 * 60,


right now all these 35d ttl settings are basically ignored right? like if we changed them, the tests would still succeed? is this the case for all tests where this change is made?
I guess this puts us on track if we later want to make alignRequests smarter and take ttl into account. is that the purpose of these changes or do you have something else in mind? I'm having some trouble understanding what the goal is here.

The reason I changed the tests is because alignRequests now expects the second parameter to be of type AggSettings{} and that type requires a number to be there as the default TTL. You're right that in the case of these tests that number does not matter at all, so I've just put the default value there.

Dieterbe · 2017-02-01T13:20:28Z

mdata/aggmetrics.go

@@ -15,7 +15,7 @@ type AggMetrics struct {
 	Metrics        map[string]*AggMetric
 	chunkSpan      uint32
 	numChunks      uint32
-	aggSettings    []AggSetting // for now we apply the same settings to all AggMetrics. later we may want to have different settings.
+	aggSettings    AggSettings // for now we apply the same settings to all AggMetrics. later we may want to have different settings.
 	chunkMaxStale  uint32
 	metricMaxStale uint32
 	ttl            uint32


i believe this field is now not needed anymore

right! missed that

Dieterbe · 2017-02-01T13:23:07Z

mdata/aggregator.go

 	Ready     bool   // ready for reads?
 }

+type AggSettings struct {
+	TTL  uint32       // default TTL for raw data


this field should be RawTTL or something because it easily gets confusing otherwise. btw why does it say "default TTL"? this value is whatever the user asked for, not any default.

ah, i meant default in the sense of that it isn't a rollup or any aggregation, but just the normal raw data

Will update to RawTTL

Dieterbe · 2017-02-01T13:32:00Z

mdata/store_cassandra_test.go

+		{oneYear, 20, "metric_%d", "metric_8192", 410},
+		{oneDay, 50, "metric_%d", "metric_16", 1},
+		{oneMonth, 50, "metric_%d", "metric_512", 11},
+		{oneYear, 50, "metric_%d", "metric_8192", 164},


can you make this a bit more extensive and test some edge cases (like TTL of 0s, 10s, 1min, maxuint32, and on and around a boundary, for example 1024 hours minus a second, 1024 hours minus 2 seconds, 1024 hours, 1024 hours and a second)

Done, added some more edge cases

I should update the table in the comments to add a case for TTLs that are 0-1 hours.
done: 158736f#diff-f63a36d3200e1e4aa79054431d8b5c52R110

woodsaj · 2017-02-01T16:03:00Z

mdata/store_cassandra.go

    key ascii,
    ts int,
    data blob,
    PRIMARY KEY (key, ts)
 ) WITH CLUSTERING ORDER BY (ts DESC)
-    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_unit': 'DAYS', 'compaction_window_size': '1' }
+    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_unit': 'HOURS', 'compaction_window_size': '%d' }
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}`


Lets change this to the correct format for cassandra3.X

AND compression = { 'class' : 'LZ4Compressor' }

woodsaj · 2017-02-01T16:08:11Z

mdata/store_cassandra.go

    key ascii,
    ts int,
    data blob,
    PRIMARY KEY (key, ts)
 ) WITH CLUSTERING ORDER BY (ts DESC)
-    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_unit': 'DAYS', 'compaction_window_size': '1' }
+    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_unit': 'HOURS', 'compaction_window_size': '%d' }


lets update to use cassandra3.x format for the class.

AND compaction = {'class': 'TimeWindowCompactionStrategy',

updated it. I tested how it looks when the table gets created, and it resolves the class path when it creates the table so they end up looking pretty much the same:

cqlsh:metrictank> CREATE TABLE metrictank.metric_5122 ( ... key ascii, ... ts int, ... data blob, ... PRIMARY KEY (key, ts) ... ) WITH CLUSTERING ORDER BY (ts DESC) ... AND bloom_filter_fp_chance = 0.01 ... AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} ... AND comment = '' ... AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '26', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold': '4'} ... AND compression = {'chunk_length_in_kb': '64', 'class': 'LZ4Compressor'} ... AND crc_check_chance = 1.0 ... AND dclocal_read_repair_chance = 0.1 ... AND default_time_to_live = 0 ... AND gc_grace_seconds = 864000 ... AND max_index_interval = 2048 ... AND memtable_flush_period_in_ms = 0 ... AND min_index_interval = 128 ... AND read_repair_chance = 0.0 ... AND speculative_retry = '99PERCENTILE'; cqlsh:metrictank> describe table metric_5122 CREATE TABLE metrictank.metric_5122 ( key ascii, ts int, data blob, PRIMARY KEY (key, ts) ) WITH CLUSTERING ORDER BY (ts DESC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '26', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE';

woodsaj · 2017-02-01T16:25:09Z

mdata/store_cassandra.go

-    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_unit': 'DAYS', 'compaction_window_size': '1' }
-    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}`
+    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_unit': 'HOURS', 'compaction_window_size': '%d' }
+    AND compression = { 'sstable_compression' : 'LZ4Compressor' }`


this should be

AND compression = { 'class' : 'LZ4Compressor' }

woodsaj · 2017-02-01T16:25:35Z

mdata/store_cassandra.go

    key ascii,
    ts int,
    data blob,
    PRIMARY KEY (key, ts)
 ) WITH CLUSTERING ORDER BY (ts DESC)
-    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_unit': 'DAYS', 'compaction_window_size': '1' }
-    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}`
+    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_unit': 'HOURS', 'compaction_window_size': '%d' }


{'class': 'TimeWindowCompactionStrategy'

why is this better? some sort of protection against namespace refactors?

The full class path should only be used for custom classes. The in-built ones should use the short name. I don't really know why it is just how it is presented in the latest docs.

https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsConfigCompress.html?hl=compression

We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382

replay force-pushed the 382 branch 3 times, most recently from 2f67b48 to 7232df1 Compare January 19, 2017 17:18

woodsaj reviewed Jan 19, 2017

View reviewed changes

replay force-pushed the 382 branch 4 times, most recently from c893994 to 06cd549 Compare January 22, 2017 20:16

Dieterbe reviewed Jan 23, 2017

View reviewed changes

Dieterbe suggested changes Jan 23, 2017

View reviewed changes

replay force-pushed the 382 branch 3 times, most recently from 3dd9ffa to d5f75e8 Compare January 25, 2017 16:46

replay changed the title ~~[WIP] distribute metrics across tables of varying compaction windows~~ distribute metrics across tables of varying compaction windows Jan 25, 2017

Dieterbe reviewed Jan 30, 2017

View reviewed changes

cmd/mt-store-cat/main.go

@@ -204,12 +205,20 @@ func main() {

Copy link

Contributor

Dieterbe Jan 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 should be 4 now

replay reacted with thumbs up emoji

Dieterbe reviewed Jan 30, 2017

View reviewed changes

Dieterbe reviewed Feb 1, 2017

View reviewed changes

replay force-pushed the 382 branch from 82c9a2d to 3919051 Compare February 1, 2017 11:54

Dieterbe reviewed Feb 1, 2017

View reviewed changes

replay force-pushed the 382 branch from 3919051 to 643ce04 Compare February 1, 2017 13:16

Dieterbe reviewed Feb 1, 2017

View reviewed changes

replay force-pushed the 382 branch from 643ce04 to f7c9d05 Compare February 1, 2017 13:28

Dieterbe reviewed Feb 1, 2017

View reviewed changes

replay force-pushed the 382 branch 3 times, most recently from 158736f to b605d91 Compare February 1, 2017 14:15

replay mentioned this pull request Feb 1, 2017

Collect metrics about datapoints sent to graphite-api #509

Merged

woodsaj reviewed Feb 1, 2017

View reviewed changes

replay force-pushed the 382 branch from da4a73e to 2b55e96 Compare February 1, 2017 16:07

woodsaj reviewed Feb 1, 2017

View reviewed changes

replay force-pushed the 382 branch from 2b55e96 to 982b4a8 Compare February 1, 2017 16:34

woodsaj approved these changes Feb 1, 2017

View reviewed changes

Dieterbe approved these changes Feb 1, 2017

View reviewed changes

replay added 4 commits February 1, 2017 19:02

Create metric tables according to TTL

e03401e

We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382

parse ttls from strings

2f102f1

add more edge case tests

e7a38cb

fix help messages

ba05948

Dieterbe force-pushed the 382 branch from 8b2b493 to ba05948 Compare February 1, 2017 18:05

Dieterbe merged commit f1868cc into master Feb 1, 2017

Dieterbe deleted the 382 branch September 18, 2018 09:00

distribute metrics across tables of varying compaction windows #484

distribute metrics across tables of varying compaction windows #484

Conversation

replay commented Jan 19, 2017 • edited Loading

Choose a reason for hiding this comment

replay Jan 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Jan 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Jan 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Jan 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Jan 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe left a comment

Choose a reason for hiding this comment

replay commented Jan 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Jan 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe commented Jan 30, 2017

Choose a reason for hiding this comment

Dieterbe Jan 30, 2017 • edited Loading

Choose a reason for hiding this comment

replay Jan 30, 2017 • edited Loading

Choose a reason for hiding this comment

Dieterbe Jan 30, 2017 • edited Loading

Choose a reason for hiding this comment

replay Jan 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Jan 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Feb 1, 2017 • edited Loading

Choose a reason for hiding this comment

replay Feb 1, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Feb 1, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Feb 1, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Feb 1, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay commented Jan 19, 2017 •

edited

Loading

replay Jan 19, 2017 •

edited

Loading

replay Jan 20, 2017 •

edited

Loading

Dieterbe Jan 23, 2017 •

edited

Loading

replay Jan 23, 2017 •

edited

Loading

Dieterbe Jan 23, 2017 •

edited

Loading

Dieterbe Jan 30, 2017 •

edited

Loading

Dieterbe Jan 30, 2017 •

edited

Loading

replay Jan 30, 2017 •

edited

Loading

Dieterbe Jan 30, 2017 •

edited

Loading

replay Jan 30, 2017 •

edited

Loading

replay Jan 30, 2017 •

edited

Loading

Dieterbe Feb 1, 2017 •

edited

Loading

replay Feb 1, 2017 •

edited

Loading

Dieterbe Feb 1, 2017 •

edited

Loading

replay Feb 1, 2017 •

edited

Loading

replay Feb 1, 2017 •

edited

Loading

replay Feb 1, 2017 •

edited

Loading

replay Feb 1, 2017 •

edited

Loading