test(perf): add mv write test that increases latency of regular reads #8724

temichus · 2024-09-17T09:30:55Z

the idea is to test the hardest case - modifying a column that is a regular column in the base table, but in the materialized view is one of the primary key columns.

test steps

1 - 3 node cluster with 2 tables
2 - do special prepare CMD for table 1, and use table 2 as for latency PERF TEST (prepare_write_cmd) 3 - start read workload for table 2 - measure latency for table 2 (10min) (stress_cmd_r) 3 - do a special rewrite workload for table 1 to measure latency for table 2 (while changing for table 1 applying )(stress_cmd_no_mv) 4 - create MV, and wait for MV to sync - measure latency for table 2 (while MV is syncing )
5- do special rewrite workload for table 1 again - measure latency for table 2 (while changing for table 1 applying ) (stress_cmd_mv)

fixes: https://github.com/scylladb/qa-tasks/issues/1706

Testing

https://argus.scylladb.com/test/c16f1343-90c0-4015-9d60-d45670b09f33/runs?additionalRuns[]=59825d8c-9ba2-4b0c-88a5-77c6993cd0ca - CS crashes, latency goes to stratosphere

PR pre-checks (self review)

I added the relevant backport labels
I didn't leave commented-out/debugging code

Reminders

Add New configuration option and document them (in sdcm/sct_config.py)
Add unit tests to cover my changes (under unit-test/ folder)
Update the Readme/doc folder relevant to this change (if needed)

soyacz · 2024-09-17T10:44:20Z

performance_regression_test.py

+        node1.run_cqlsh("CREATE TABLE IF NOT EXISTS scylla_bench.test (pk bigint,ck bigint,v blob,PRIMARY KEY(pk, ck)) WITH compression = { }")
+        node1.run_cqlsh("CREATE MATERIALIZED VIEW IF NOT EXISTS scylla_bench.view_test AS SELECT * FROM scylla_bench.test where v IS NOT NULL AND ck IS NOT NULL AND pk IS NOT NULL PRIMARY KEY (v, pk, ck)")
+        start_time = time.time()
+        while True:


we have sdcm.utils.nemesis_utils.indexes.wait_for_view_to_be_built for waiting for index to be built

soyacz · 2024-09-17T10:53:03Z

test-cases/performance/perf-regression-latency-mv-read-concurrency.yaml

+n_loaders: 4
+n_monitor_nodes: 1
+
+instance_type_loader: 'c5.2xlarge'


let's use something newer to avoid problems with capacity. E.g. c6i

I just use perf-regression-latency-250gb-with-nemesis.yaml as a base. Can chane+ test on suggested instances

soyacz · 2024-09-17T10:53:16Z

test-cases/performance/perf-regression-latency-mv-read-concurrency.yaml

+
+instance_type_loader: 'c5.2xlarge'
+instance_type_monitor: 't3.large'
+instance_type_db: 'i3.2xlarge'


still testing on i3?

I just use perf-regression-latency-250gb-with-nemesis.yaml as a base. Can chane+ test on suggested instances

soyacz · 2024-09-17T10:54:22Z

test-cases/performance/perf-regression-latency-mv-read-concurrency.yaml

@@ -0,0 +1,31 @@
+test_duration: 680
+prepare_write_cmd: ["cassandra-stress write no-warmup cl=ALL n=100000 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=100 -col 'size=FIXED(128) n=FIXED(8)' -pop seq=1..2000000",
+                    "scylla-bench -workload=sequential -mode=write -replication-factor=2 -partition-count=10 -partition-offset=0 -clustering-row-count=1000000 -clustering-row-size=uniform:100..5120 -concurrency=200 -rows-per-request=10 -timeout=30s -connection-count 200  -consistency-level=all",


looks like 'large partition' workload, is this 'special'?

it is just prepare separate table with 'large partition"

'special workload", it is a combination of table with large partitions + MV + small write in table that causes a lot of MV updates
stress_cmd_no_mv and stress_cmd_mv are the same but difference in latency during executing this command is huge https://github.com/scylladb/qa-tasks/issues/1706#issuecomment-2351624595

soyacz · 2024-09-17T10:55:45Z

performance_regression_test.py

+
+    def test_read_mv_latency(self):
+        # next 3 lines, is a workaround to have it working inside `latency_calculator_decorator`
+        self.cluster = self.db_cluster  # pylint: disable=attribute-defined-outside-init


I'm not sure if needed anymore, recently was fixed

this hack still exists in PerformanceRegressionUpgradeTest.run_workload_and_upgrade where latency_calculator_decorator is used, but will doublecheck

soyacz · 2024-09-19T13:27:57Z

test-cases/performance/perf-regression-latency-mv-read-concurrency.yaml

+stress_cmd_mv: "scylla-bench -workload=uniform -mode=write -replication-factor=2 -partition-count=30 -clustering-row-count=1000000 -clustering-row-size=uniform:100..5120 -concurrency=200 -max-rate=4000 -rows-per-request=1 -timeout=30s -connection-count 200  -consistency-level=one -iterations=0 -duration=15m"
+
+n_db_nodes: 3
+n_loaders: 4


do we need that number of loaders? did you check how they are utilized?

you are right, we probably don't need 4(I did not check, just copied from the latency-125gb test ). Is it important? The test is just reproducer of this problem and should not be used in regular runs.

managed to reproduce the problem on 2 nodes

soyacz · 2024-09-19T13:29:46Z

test-cases/performance/perf-regression-latency-mv-read-concurrency.yaml

+                    "scylla-bench -workload=sequential -mode=write -replication-factor=2 -partition-count=10 -partition-offset=10 -clustering-row-count=1000000 -clustering-row-size=uniform:100..5120 -concurrency=200 -rows-per-request=10 -timeout=30s -connection-count 200  -consistency-level=all",
+                    "scylla-bench -workload=sequential -mode=write -replication-factor=2 -partition-count=10 -partition-offset=20 -clustering-row-count=1000000 -clustering-row-size=uniform:100..5120 -concurrency=200 -rows-per-request=10 -timeout=30s -connection-count 200  -consistency-level=all"]
+
+stress_cmd_r: "cassandra-stress read cl=ALL duration=600m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate 'threads=10 throttle=100/s' -col 'size=FIXED(128) n=FIXED(8)' -pop 'dist=gauss(1..100000,50000,50000)' "


latency decorator verifies hdr reports from this command, does 100/s throttling make any sense?
Cluster is not loaded much.

cluster actually overloaded during there test(but not by there command). latency for this command during do_rewrite_workload_with_mv step is 224412.04 or I/o load even failed.

soyacz · 2024-09-19T13:31:12Z

performance_regression_test.py

@@ -891,3 +892,51 @@ def test_latency_write_with_upgrade(self):
    def test_latency_mixed_with_upgrade(self):
        self._prepare_latency_with_upgrade()
        self.run_workload_and_upgrade(stress_cmd=self.params.get('stress_cmd_m'))
+
+
+class PerformanceRegressionMaterializedViewLatencyTest(PerformanceRegressionTest):


would be good to add doc string describing purpose of this test, e.g. measuring impact of creating mv on read latency and describe specifics of "special" workload.

added doc string

soyacz · 2024-09-19T13:31:32Z

jenkins-pipelines/performance_staging/perf-regression-latency-mv-read-concurrency.jenkinsfile

+    test_name: "performance_regression_test.PerformanceRegressionMaterializedViewLatencyTest",
+    test_config: """["test-cases/performance/perf-regression-latency-mv-read-concurrency.yaml"]""",
+    sub_tests: ["test_read_mv_latency"],
+    email_recipients: 'wojciech.mitros@scylladb.com,artsiom.mishuta@scylladb.com,piodul@scylladb.com'


only those 3 names? better use group.

The test is just reproducer of this problem and should not be used in regular runs.

soyacz · 2024-09-23T14:25:49Z

sdcm/results_analyze/__init__.py

+        dataset_size_match = re.search(r'(\d{3}gb)', config_files)
+        if dataset_size_match is None:
+            dataset_size = 'unknown size'
+        else:
+            dataset_size = dataset_size_match.group()


this changed recently, need to rebase

soyacz · 2024-09-24T06:17:02Z

sdcm/results_analyze/__init__.py

@@ -481,8 +481,11 @@ def check_regression(self, test_id, data, is_gce=False, node_benchmarks=None, em
        kernel_callstack_events_summary = {Severity.DEBUG.name: len(kernel_callstack_events)}

        config_files = ' '.join(doc["_source"]["setup_details"]["config_files"])
-        search_size = re.search(r'(\d.*(?#t|g)b)', config_files)
-        dataset_size = search_size.group() if search_size else 'unknown size'
+        dataset_size_match = re.search(r'(\d.*(?#t|g)b)', config_files)


this change is now redundant.

you are right, removed

the idea is to test the hardest case - modifying a column that is a regular column in the base table, but in the materialized view is one of the primary key columns.

soyacz

LGTM

aleksbykov

LGTM, Have several comments

aleksbykov · 2024-09-24T09:25:22Z

performance_regression_test.py

+        node1.run_cqlsh(
+            "CREATE TABLE IF NOT EXISTS scylla_bench.test (pk bigint,ck bigint,v blob,PRIMARY KEY(pk, ck)) WITH compression = { }")


Do you need this command? this table have to be created on prepare already

aleksbykov · 2024-09-24T09:27:24Z

performance_regression_test.py

+        self.run_fstrim_on_all_db_nodes()
+
+        self.create_test_stats(sub_type="read", append_sub_test_to_name=False, test_index="mv-overloading-latency-read")
+        self.run_stress_thread(stress_cmd=self.params.get('stress_cmd_r'), stress_num=1,


you don't need the result of this command, do you? it is just background workload?

github-actions bot assigned temichus Sep 17, 2024

soyacz reviewed Sep 17, 2024

View reviewed changes

temichus force-pushed the mv-overload-2 branch 4 times, most recently from 7687975 to 5bafe7e Compare September 18, 2024 15:10

temichus added the backport/none Backport is not required label Sep 19, 2024

temichus marked this pull request as ready for review September 19, 2024 10:33

temichus requested review from enaydanov, piodul, wmitros and soyacz September 19, 2024 10:33

soyacz reviewed Sep 19, 2024

View reviewed changes

temichus force-pushed the mv-overload-2 branch 2 times, most recently from 27aed59 to c7c0c71 Compare September 23, 2024 10:33

temichus requested a review from soyacz September 23, 2024 10:46

soyacz reviewed Sep 23, 2024

View reviewed changes

soyacz previously approved these changes Sep 23, 2024

View reviewed changes

temichus dismissed soyacz’s stale review via 552e7a3 September 23, 2024 14:30

temichus force-pushed the mv-overload-2 branch 2 times, most recently from 552e7a3 to bf661d8 Compare September 23, 2024 14:31

temichus requested review from soyacz, juliayakovlev and aleksbykov September 23, 2024 14:32

soyacz reviewed Sep 24, 2024

View reviewed changes

temichus force-pushed the mv-overload-2 branch from bf661d8 to 6967344 Compare September 24, 2024 07:44

test(perf): add mv write test that increases latency of regular reads

13a79b7

the idea is to test the hardest case - modifying a column that is a regular column in the base table, but in the materialized view is one of the primary key columns.

temichus force-pushed the mv-overload-2 branch from 6967344 to 13a79b7 Compare September 24, 2024 07:44

temichus requested a review from soyacz September 24, 2024 07:45

soyacz approved these changes Sep 24, 2024

View reviewed changes

aleksbykov approved these changes Sep 24, 2024

View reviewed changes

temichus merged commit 081e6a0 into scylladb:master Sep 24, 2024
7 checks passed

scylladbbot added the promoted-to-master label Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(perf): add mv write test that increases latency of regular reads #8724

test(perf): add mv write test that increases latency of regular reads #8724

temichus commented Sep 17, 2024 •

edited

Loading

soyacz Sep 17, 2024

soyacz Sep 17, 2024

temichus Sep 17, 2024

soyacz Sep 17, 2024

temichus Sep 17, 2024

soyacz Sep 17, 2024

temichus Sep 17, 2024

soyacz Sep 17, 2024

temichus Sep 17, 2024 •

edited

Loading

soyacz Sep 19, 2024

temichus Sep 22, 2024 •

edited

Loading

temichus Sep 23, 2024 •

edited

Loading

soyacz Sep 19, 2024

temichus Sep 22, 2024

soyacz Sep 19, 2024

temichus Sep 22, 2024

soyacz Sep 19, 2024

temichus Sep 22, 2024

soyacz Sep 23, 2024

temichus Sep 23, 2024

soyacz Sep 24, 2024

temichus Sep 24, 2024 •

edited

Loading

soyacz left a comment

aleksbykov left a comment

aleksbykov Sep 24, 2024

aleksbykov Sep 24, 2024

		node1.run_cqlsh(
		"CREATE TABLE IF NOT EXISTS scylla_bench.test (pk bigint,ck bigint,v blob,PRIMARY KEY(pk, ck)) WITH compression = { }")

test(perf): add mv write test that increases latency of regular reads #8724

test(perf): add mv write test that increases latency of regular reads #8724

Conversation

temichus commented Sep 17, 2024 • edited Loading

Testing

PR pre-checks (self review)

Reminders

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

temichus Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

temichus Sep 22, 2024 • edited Loading

Choose a reason for hiding this comment

temichus Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

temichus Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

soyacz left a comment

Choose a reason for hiding this comment

aleksbykov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

temichus commented Sep 17, 2024 •

edited

Loading

temichus Sep 17, 2024 •

edited

Loading

temichus Sep 22, 2024 •

edited

Loading

temichus Sep 23, 2024 •

edited

Loading

temichus Sep 24, 2024 •

edited

Loading