-
Notifications
You must be signed in to change notification settings - Fork 105
use separate cassandra tables for raw and rollup data #382
Comments
I think this should be an "up next" priority for our upcoming hosted metrics milestone. We should get this rolled out before we start deploying customers to the new platforms, otherwise we have to worry about data migrations for setups that will be online and (likely) in-use. thoughts @woodsaj @replay @nopzor1200 ? |
Isn't this actually two separate issues where one is splitting the table up and the other is changing the row key suffix from |
yes, these are separate and i am not certain that we even need to change the row key. Start with splitting the tables up by TTL. |
I'm trying to figure out whats the best way to decide which metric goes into which table and what tables we need. I can think of two scenarios:
I'm kind of tending to |
I had imagined we'd do 2, it seemed straightforward to me to assure the right tables exist based on the config, rather than pre-generating a whole bunch that may not be used. As far as specific cassandra parameters and recommendations, I don't have much/any experience there. I would be up for a hangout with you and @woodsaj so we can all learn and discuss the tradeoffs. I suspect mostly it would be woodsaj filling us in on what he knows. |
I assume we could go for some middle way too. Like let's say we take all the |
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
something along those lines. but isn't that too few tables? I'm not sure what the ideal amount of sstables is (did that come up in the TWCS talk?) but 2 or 3 seems too few. Also I don't think we need to dynamically figure out tables at runtime. we can just pre-compute them at startup based on the ttl setting and the ttl's from agg-settings. so the locking isn't needed either. |
good point that i could get rid of the lock, will do that |
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. issue #382
Create metric tables according to TTL We want tables to have varying compaction window sizes depending on a metric's TTL. At the same time we want to be able to group those with similar TTLs. This is some kind of middle way. fix #382
Currently we are using a single cassandra table for all data.
To improve performance we should split the table up.
why?
The timeWindowedCompactionStratergy (TWCS) used on the table creates a single SSTable for the time window configured. Only 1 timeWindow can be set, and we currently use 1day. If we were to keep 1year of data we would end up with 365 SSTables, which is not ideal. 1 SSTable per day is good for raw data, but rollups should be 1 week or 1 month.
If all data written used the same TTL duration, we would benefit from being able to drop entire sstables once all chunks had expired without needing to compact the sstable to clean out tombstoned data. However each aggregation type (raw, 10min, 2hour, etc..) has different TTLs. So an sstable would have raw values that would likely all be expired mixed with aggregate values that have not expired.
We are currently placing 4weeks of data per row. This is good for querying short time ranges, but larger for larger time ranges it means querying across many partitions and nodes.
The text was updated successfully, but these errors were encountered: