Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c-s fails to prepare test table during disrupt_truncate nemesis, but the test continuous and starts the disruption #8722

Open
dimakr opened this issue Sep 16, 2024 · 1 comment
Assignees
Labels
Bug Something isn't working right

Comments

@dimakr
Copy link
Contributor

dimakr commented Sep 16, 2024

At the beginning of disrupt_truncate nemesis the test ks/table are prepared with the c-s command:

< t:2024-09-14 12:46:50,787 f:stress_thread.py l:325  c:sdcm.stress_thread   p:INFO  > cassandra-stress write no-warmup n=400000 cl=QUORUM -mode native cql3  user=cassandra password=cassandra -schema keyspace=ks_truncate 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -log interval=5 -transport 'truststore=/etc/scylla/ssl_conf/truststore.jks truststore-password=cassandra' -node 10.0.0.5,10.0.0.6,10.0.0.7,10.0.0.8,10.0.0.14 -errors skip-unsupported-columns

The command fails with the error:

WARN  [cluster1-nio-worker-5] 2024-09-14 12:46:55,827 RequestHandler.java:303 - Query '[0 bound values] CREATE KEYSPACE IF NOT EXISTS "ks_truncate" WITH replication = {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'replication_factor' : '3'} AND durable_writes = true;' generated server side warning(s): Tables in this keyspace will be replicated using Tablets and will not support CDC, LWT and counters features. To use CDC, LWT or counters, drop this keyspace and re-create it without tablets by adding AND TABLETS = {'enabled': false} to the CREATE KEYSPACE statement.
WARN  [cluster1-worker-1] 2024-09-14 12:46:56,840 ReplicationStategy.java:204 - Error while computing token map for keyspace ks_truncate with datacenter eastus_nemesis_dc: could not achieve replication factor 3 (found 1 replicas only), check your keyspace replication settings.
WARN  [cluster1-worker-2] 2024-09-14 12:46:57,282 ReplicationStategy.java:204 - Error while computing token map for keyspace ks_truncate with datacenter eastus_nemesis_dc: could not achieve replication factor 3 (found 1 replicas only), check your keyspace replication settings.
java.lang.RuntimeException: Encountered exception creating schema
	at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesNative(SettingsSchema.java:105)
	at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpaces(SettingsSchema.java:74)
	at org.apache.cassandra.stress.settings.StressSettings.maybeCreateKeyspaces(StressSettings.java:230)
	at org.apache.cassandra.stress.StressAction.run(StressAction.java:58)
	at org.apache.cassandra.stress.Stress.run(Stress.java:143)
	at org.apache.cassandra.stress.Stress.main(Stress.java:62)
Caused by: com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException: Datacenter eastus_nemesis_dc doesn't have enough token-owning nodes for replication_factor=3
	at com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException.copy(InvalidConfigurationInQueryException.java:38)
	at com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException.copy(InvalidConfigurationInQueryException.java:27)
	at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
	at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:310)
	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58)
	at org.apache.cassandra.stress.util.JavaDriverClient.execute(JavaDriverClient.java:215)
	at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesNative(SettingsSchema.java:94)
        ... 5 more         

Even though the c-s command failed the nemesis continues and starts the truncate disruption which fails with:

Command: '/usr/bin/cqlsh --no-color -u cassandra -p \'cassandra\'  --request-timeout=600 --connect-timeout=60 --ssl -e "TRUNCATE ks_truncate.standard1 USING TIMEOUT 600s" 10.0.0.8'
Exit code: 2
Stdout:
Stderr:
Warning: Using a password on the command line interface can be insecure.
Recommendation: use the credentials file to securely provide the password.
<stdin>:1:InvalidRequest: Error from server: code=2200 [Invalid query] message="unconfigured table standard1"

Installation details

Cluster size: 4 nodes (Standard_L16s_v3)

Scylla Nodes used in this run:

  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-7 (null | 10.0.0.5) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-6 (null | 10.0.0.7) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-5 (null | 10.0.0.14) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-4 (null | 10.0.0.8) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-3 (null | 10.0.0.7) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-2 (null | 10.0.0.6) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-ce64f53c-eastus-1 (null | 10.0.0.5) (shards: 14)

OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/SCYLLA-IMAGES/providers/Microsoft.Compute/images/scylla-6.2.0-dev-x86_64-2024-09-13T02-56-40 (azure: undefined_region)

Test: longevity-1tb-5days-azure-test
Test id: ce64f53c-084b-4445-8b62-784fa80adf1c
Test name: scylla-master/tier1/longevity-1tb-5days-azure-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor ce64f53c-084b-4445-8b62-784fa80adf1c
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs ce64f53c-084b-4445-8b62-784fa80adf1c

Logs:

Jenkins job URL
Argus

@roydahan roydahan added the Bug Something isn't working right label Nov 10, 2024
@roydahan
Copy link
Contributor

should have a quick fix to catch it, raise an error and exit the nemesis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working right
Projects
None yet
Development

No branches or pull requests

2 participants