Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[client_encrypt] configparser.DuplicateOptionError: While reading from '.../cqlshrc' [line 8]: option 'usercert' in section 'ssl' already exists #8820

Open
1 of 2 tasks
fruch opened this issue Sep 24, 2024 · 2 comments
Assignees
Labels
Bug Something isn't working right

Comments

@fruch
Copy link
Contributor

fruch commented Sep 24, 2024

Packages

Scylla version: 2024.3.0~dev-20240923.a0b3f6b1b802 with build-id 6a1ee58fde1e0fd72d5c62c9cf5f309b7cb46d89
Kernel Version: 5.4.0-1021-aws-fips

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.
2024-09-23 23:18:19.725: (DisruptionEvent Severity.ERROR) period_type=end event_id=350a8bc2-4199-4900-b308-cc41bc0db3c9 duration=10s: nemesis_name=LoadAndStream target_node=Node longevity-fips-master-db-node-c4eedbb7-5 [3.250.106.155 | 10.4.0.221] errors=Encountered a bad command exit code!
Command: '/usr/bin/cqlsh --no-color   --request-timeout=120 --connect-timeout=60 --ssl -e "describe keyspaces" 10.4.0.199'
Exit code: 1
Stdout:
Stderr:
File "/home/ubuntu/.shiv/cqlsh_b671ce4b4d8e3bb095c115aa85a975aed5a635d9bbccdc1ca6dd34550e925fd7/site-packages/cqlsh/__main__.py", line 7, in main
cqlsh_main(*read_options(sys.argv[1:], os.environ))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/.shiv/cqlsh_b671ce4b4d8e3bb095c115aa85a975aed5a635d9bbccdc1ca6dd34550e925fd7/site-packages/cqlsh/cqlsh.py", line 2433, in read_options
configs.read(CONFIG_FILE)
File "/opt/scylladb/python3/lib64/python3.12/configparser.py", line 684, in read
self._read(fp, filename)
File "/opt/scylladb/python3/lib64/python3.12/configparser.py", line 1075, in _read
raise DuplicateOptionError(sectname, optname,
configparser.DuplicateOptionError: While reading from '/home/ubuntu/.cassandra/cqlshrc' [line  8]: option 'usercert' in section 'ssl' already exists
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5222, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 1660, in disrupt_load_and_stream
self._prepare_test_table(ks='keyspace1', table='standard1')
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 1999, in _prepare_test_table
test_keyspaces = self.cluster.get_test_keyspaces()
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 4467, in get_test_keyspaces
keyspaces = db_node.run_cqlsh("describe keyspaces").stdout.split()
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2886, in run_cqlsh
cqlsh_out = self.remoter.run(cmd, timeout=timeout + 120,  # we give 30 seconds to cqlsh timeout mechanism to work
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 653, in run
result = _run()
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 72, in inner
return func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 644, in _run
return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 577, in _run_execute
result = connection.run(**command_kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 620, in run
return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 655, in _complete_run
raise UnexpectedExit(result)
sdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!
Command: '/usr/bin/cqlsh --no-color   --request-timeout=120 --connect-timeout=60 --ssl -e "describe keyspaces" 10.4.0.199'
Exit code: 1
Stdout:
Stderr:
File "/home/ubuntu/.shiv/cqlsh_b671ce4b4d8e3bb095c115aa85a975aed5a635d9bbccdc1ca6dd34550e925fd7/site-packages/cqlsh/__main__.py", line 7, in main
cqlsh_main(*read_options(sys.argv[1:], os.environ))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/.shiv/cqlsh_b671ce4b4d8e3bb095c115aa85a975aed5a635d9bbccdc1ca6dd34550e925fd7/site-packages/cqlsh/cqlsh.py", line 2433, in read_options
configs.read(CONFIG_FILE)
File "/opt/scylladb/python3/lib64/python3.12/configparser.py", line 684, in read
self._read(fp, filename)
File "/opt/scylladb/python3/lib64/python3.12/configparser.py", line 1075, in _read
raise DuplicateOptionError(sectname, optname,
configparser.DuplicateOptionError: While reading from '/home/ubuntu/.cassandra/cqlshrc' [line  8]: option 'usercert' in section 'ssl' already exists

Impact

cqlsh commands fails to work

How frequently does it reproduce?

So far seen on one run

Installation details

Cluster size: 6 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

  • longevity-fips-master-db-node-c4eedbb7-6 (54.247.10.49 | 10.4.0.86) (shards: 14)
  • longevity-fips-master-db-node-c4eedbb7-5 (3.250.106.155 | 10.4.0.221) (shards: 14)
  • longevity-fips-master-db-node-c4eedbb7-4 (54.247.53.37 | 10.4.1.62) (shards: 14)
  • longevity-fips-master-db-node-c4eedbb7-3 (54.217.63.242 | 10.4.3.64) (shards: 14)
  • longevity-fips-master-db-node-c4eedbb7-2 (46.51.147.42 | 10.4.3.12) (shards: 14)
  • longevity-fips-master-db-node-c4eedbb7-1 (54.246.30.230 | 10.4.0.199) (shards: 14)

OS / Image: ami-0a7a54101ef149ada (aws: undefined_region)

Test: longevity-100gb-4h-fips-test
Test id: c4eedbb7-863f-4d91-b328-58db4e011959
Test name: scylla-enterprise/SCT_Enterprise_Features/FIPS/longevity-100gb-4h-fips-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor c4eedbb7-863f-4d91-b328-58db4e011959
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs c4eedbb7-863f-4d91-b328-58db4e011959

Logs:

Jenkins job URL
Argus

@fruch fruch added the Bug Something isn't working right label Sep 24, 2024
@fruch fruch removed their assignment Sep 26, 2024
@dimakr
Copy link
Contributor

dimakr commented Sep 27, 2024

Tried to reproduce this a few times, but no luck.
The latest run #130 of longevity-100gb-4h-fips-tes test in CI is also OK.

Also, I checked the DB system logs and there was some absence of any logs between 23:15 and 23:18, it can also be observed on the screenshots - the 'requests' related graphs (total requests, server requests) show that during that timeframe requests count on instances dropped to 0.
Image

Maybe there was some networking connectivity problems at that time that caused that cqlshrc got corrupted on the DB nodes..

@fruch
Copy link
Contributor Author

fruch commented Sep 29, 2024

Let's keep track of this one, I don't see how Network issue would cause the failure we are seeing.

But it happened once so far, let's see if it pops up again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working right
Projects
None yet
Development

No branches or pull requests

2 participants