-
Notifications
You must be signed in to change notification settings - Fork 105
poor performance of querying metric_idx #453
Comments
All that is needed is to change the schema to
|
It is not possible to alter an existing table but we dont really need a migration tool as the change does not affect how we read or write to the table, it only affects cassandra internals of how the table is stored. Existing deployments that are experiencing performance problems can be "migrated" by exporting the current data, droping the table, re-creating the table with the new schema, importing the data. The below script can be used.
|
can you elaborate on how you observed the problem and how you diagnosed it? is the clustering key just the first low hanging fruit, or are you confident it is the problem? (if so, why?) thanks |
MT crashes at startup due to cassandra timeout loading the index. When testing the queries directly on cassandra with So removing the clustering key improves performance, but we could still improve it more by using the partition as the partitionKey and the id as the clustering key. We then also wouldnt need the secondary index. This would mean that the partition would need to be known for all queries. The only code change i can see that would be needed is to change the DELETE query used to include the partition, ie That might be the better option and can also be handled without a migration tool as the updated delete command would still work on the current version of the schema. @Dieterbe do any of your mt-X tools make queries to the C* index without knowledge of the "partition" they are querying? |
So if I understand it correctly, can you please confirm I got this right:
(note: I think we can just remove the
FWIW I found this very useful:
yes, see |
yes
The consequence of using the kafka partition as the cassandra partitionKey is that all metrics in that kafka partition will be saved on the same cassandra node, cassandra partitions do not span nodes. With our query pattern of either querying all metricDefs in a partition or deleting 1 metricDef by id (and partition) this new schema will perform much better. When loading all metricDefs for a partition the cassandra node just needs to stream the local sstables to the client and there is no need for co-ordination between nodes. The primary purpose of splitting data up between cassandra nodes is to ensure that queries get evenly distributed between the nodes. But for best performance each query should ideally be able to be handled by a single node. The queries used in mt-index-cat will still if the kafka partition is used as the
we need to delete by id.
the choice of partition and clustering keys does not affect executing a select without any where conditions |
ok sounds good to me :) |
The new metric_idx table is performing terribly.
We need to drop the clustering key as it is redundant.
As our paritionKey is unique (metricDef.Id), there is 0 value to have a clusteringKey.
The text was updated successfully, but these errors were encountered: