use separate cassandra tables for raw and rollup data #382

woodsaj · 2016-11-12T14:09:47Z

Currently we are using a single cassandra table for all data.
To improve performance we should split the table up.

why?

The timeWindowedCompactionStratergy (TWCS) used on the table creates a single SSTable for the time window configured. Only 1 timeWindow can be set, and we currently use 1day. If we were to keep 1year of data we would end up with 365 SSTables, which is not ideal. 1 SSTable per day is good for raw data, but rollups should be 1 week or 1 month.
If all data written used the same TTL duration, we would benefit from being able to drop entire sstables once all chunks had expired without needing to compact the sstable to clean out tombstoned data. However each aggregation type (raw, 10min, 2hour, etc..) has different TTLs. So an sstable would have raw values that would likely all be expired mixed with aggregate values that have not expired.
We are currently placing 4weeks of data per row. This is good for querying short time ranges, but larger for larger time ranges it means querying across many partitions and nodes.

Dieterbe · 2017-01-01T13:47:56Z

I think this should be an "up next" priority for our upcoming hosted metrics milestone. We should get this rolled out before we start deploying customers to the new platforms, otherwise we have to worry about data migrations for setups that will be online and (likely) in-use.

thoughts @woodsaj @replay @nopzor1200 ?

replay · 2017-01-18T12:53:38Z

Isn't this actually two separate issues where one is splitting the table up and the other is changing the row key suffix from t0/Month_sec to something else? As far as I can see these two can be handled separately right? I'll focus on the first one for now (metric table splitting).

woodsaj · 2017-01-18T17:11:48Z

yes, these are separate and i am not certain that we even need to change the row key.

Start with splitting the tables up by TTL.

replay · 2017-01-18T21:26:13Z

I'm trying to figure out whats the best way to decide which metric goes into which table and what tables we need. I can think of two scenarios:

the simple one
We create a small number of tables of different hard coded sizes. For example there could be one with 1 day, 10 days, 100 days etc, then we'd look at the TTL for each metric and put it into the table which makes sense according to the configured TTL.
the complicated one
We look at the configured agg-settings, take all the TTL values of the different settings, calculate the optimal window size for each of them and create the tables. I'm assuming that most configurations don't have more than like 3 agg-settings configured, so the number of tables wouldn't get too crazy. I'm worried that at some point in the future we might allow different agg-settings per metric/metric pattern, that would then create a larger number of tables. Furthermore it would also get complicated if we'd want to align the partition keys with the window size.

I'm kind of tending to #1 because I'm worried #2 might get too complicated. So let's assume #1 and to keep it simple let's start with only one additional table that has a larger window than the already existing 1day table.
I think in the end the decision where we need to choose the new table's window size comes down to a trade off between disk space usage, number of SSTables that need to be read, distribution of data across partitions.
The upper extreme would be to set the window size to the TTL because that way can be sure that we'll never need to read through more than 2 SSTables, but it would waste a lot of disk space. The lower extreme is what we have now with the window size of 1day, which is good disk space wise, but we might need to read through many SSTables. Do we have any idea what the common query time ranges are? Is the number of queries that look for more than 1 Month of data negligibly small?

Dieterbe · 2017-01-19T10:59:00Z

I had imagined we'd do 2, it seemed straightforward to me to assure the right tables exist based on the config, rather than pre-generating a whole bunch that may not be used.
It would be nice though if we did it so that if you make gentle tweaks to your agg-settings (e.g. ttl from 31 days to 38), it could reuse the tables, that's probably (?) a good idea.

As far as specific cassandra parameters and recommendations, I don't have much/any experience there. I would be up for a hangout with you and @woodsaj so we can all learn and discuss the tradeoffs. I suspect mostly it would be woodsaj filling us in on what he knows.

replay · 2017-01-19T11:13:28Z

I assume we could go for some middle way too. Like let's say we take all the TTLs, then for each of them we calculate the largest number that's a power of 2 and which is smaller than the TTL, and then we'll create table tables with the window size of that number of days. That way we get some kind of grouping into steps.
So for example if the default TTL is 14 and there are two agg-settings with TTL 90 and 365 then we'd end up with 3 tables that have window sizes 8, 64 and 256, which means for max-range queries we'd always need to scan either 2 or 3 SSTables. Obviously the wasted disk space would be slightly higher than it is now where everything is at window size 1day.

issue #382

replay · 2017-01-19T13:14:50Z

That's just to illustrate my idea, is that what you imagined too @Dieterbe? c7d91dd

We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382

Dieterbe · 2017-01-19T16:20:28Z

something along those lines. but isn't that too few tables? I'm not sure what the ideal amount of sstables is (did that come up in the TWCS talk?) but 2 or 3 seems too few.

Also I don't think we need to dynamically figure out tables at runtime. we can just pre-compute them at startup based on the ttl setting and the ttl's from agg-settings. so the locking isn't needed either.

replay · 2017-01-19T16:43:58Z

good point that i could get rid of the lock, will do that

We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382

Create metric tables according to TTL We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. fix #382

Dieterbe added this to the hosted-metrics-alpha milestone Jan 1, 2017

Dieterbe mentioned this issue Jan 10, 2017

better handling of raw interval > agg-interval #464

Closed

replay added a commit that referenced this issue Jan 19, 2017

just an example to show table creation with different ttls

c7d91dd

issue #382

replay mentioned this issue Jan 19, 2017

distribute metrics across tables of varying compaction windows #484

Merged

Dieterbe closed this as completed in #484 Feb 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use separate cassandra tables for raw and rollup data #382

use separate cassandra tables for raw and rollup data #382

woodsaj commented Nov 12, 2016

Dieterbe commented Jan 1, 2017

replay commented Jan 18, 2017

woodsaj commented Jan 18, 2017

replay commented Jan 18, 2017 •

edited

Loading

Dieterbe commented Jan 19, 2017

replay commented Jan 19, 2017 •

edited

Loading

replay commented Jan 19, 2017 •

edited

Loading

Dieterbe commented Jan 19, 2017

replay commented Jan 19, 2017

use separate cassandra tables for raw and rollup data #382

use separate cassandra tables for raw and rollup data #382

Comments

woodsaj commented Nov 12, 2016

Dieterbe commented Jan 1, 2017

replay commented Jan 18, 2017

woodsaj commented Jan 18, 2017

replay commented Jan 18, 2017 • edited Loading

Dieterbe commented Jan 19, 2017

replay commented Jan 19, 2017 • edited Loading

replay commented Jan 19, 2017 • edited Loading

Dieterbe commented Jan 19, 2017

replay commented Jan 19, 2017

replay commented Jan 18, 2017 •

edited

Loading

replay commented Jan 19, 2017 •

edited

Loading

replay commented Jan 19, 2017 •

edited

Loading