Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(is_port_used): make it more robust #8046

Merged
merged 2 commits into from
Jul 22, 2024
Merged

Conversation

soyacz
Copy link
Contributor

@soyacz soyacz commented Jul 17, 2024

Reverted previous fix for wait_db_down - it was working for checking db down properly, but still is_port_used was not working right for docker backend.

So used this new working approach in is_port_used instead.

fixes: #8031

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@soyacz soyacz added backport/6.0 backport/2024.2 Need backport to 2024.2 labels Jul 17, 2024
@soyacz soyacz requested a review from fruch July 17, 2024 16:54
@soyacz soyacz added test-integration Enable running the integration tests suite test-provision-docker labels Jul 17, 2024
Previous fix for verifying port down was not sufficient as still docker
case was failing sometimes - despite port was not used `is_port_used`
method was showing otherwise.

Reverted it and replaced logic for `is_port_used` method to try
establish connection with python.

fixes: scylladb#8031
@fruch
Copy link
Contributor

fruch commented Jul 17, 2024

Still has the concern of generating warning in scylla log

@soyacz
Copy link
Contributor Author

soyacz commented Jul 18, 2024

Still has the concern of generating warning in scylla log

I didn't notice warnings related to this. In ScyllaKillMonkey warnings relate to lost connections between nodes.
I also tried running is_port_used in a loop and no effect on db whatsoever.

@soyacz
Copy link
Contributor Author

soyacz commented Jul 18, 2024

in docker artifact test, we can now see that node startup takes ~8.5s while without this fix was ~4s (and was premature - as cql port was not up yet). extract from log (see, it waits properly with Waiting for DB services to be up: attempt 1 ended with: False message):

< t:2024-07-18 06:17:06,600 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <172.17.0.2>: Command "sudo sh -c "supervisorctl start scylla || supervisorctl start scylla-server"" finished with status 0
< t:2024-07-18 06:17:06,601 f:wait.py         l:52   c:sdcm.wait            p:DEBUG > wait_for: Retrying artifacts-docker-jenkins-db-node-a9b6d5ac-0: Waiting for DB services to be up: attempt 1 ended with: False
< t:2024-07-18 06:17:11,604 f:remote_base.py  l:521  c:RemoteLibSSH2CmdRunner p:DEBUG > <172.17.0.2>: Running command "sudo sh -c "supervisorctl start scylla-housekeeping || supervisorctl start scylla-housekeeping-server""...
< t:2024-07-18 06:17:12,841 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <172.17.0.2>: scylla-housekeeping: started
< t:2024-07-18 06:17:13,342 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <172.17.0.2>: Command "sudo sh -c "supervisorctl start scylla-housekeeping || supervisorctl start scylla-housekeeping-server"" finished with status 0
< t:2024-07-18 06:17:13,342 f:decorators.py   l:121  c:sdcm.utils.decorators p:DEBUG > END: start_scylla <DockerNode> (ran 8.532388s)

@soyacz soyacz requested a review from vponomaryov July 18, 2024 07:25
@fruch fruch changed the title Fix wait db up fix(is_port_used): make it more robust Jul 18, 2024
Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fruch fruch added test-provision-aws Run provision test on AWS test-provision-gce Run provision test on GCE test-provision-azure labels Jul 18, 2024
Copy link
Contributor

@vponomaryov vponomaryov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fruch fruch merged commit 5938cb6 into scylladb:master Jul 22, 2024
13 of 15 checks passed
@fruch fruch added the backport/6.1 Need backport to 6.1 label Jul 22, 2024
@scylladbbot scylladbbot added backport/6.1-done Commit backported to 6.1 backport/2024.2-done Commit backported to 2024.2 and removed backport/6.1 Need backport to 6.1 backport/2024.2 Need backport to 2024.2 labels Jul 22, 2024
@soyacz soyacz added the backport/2024.1 Need backport to 2024.1 label Jul 29, 2024
@soyacz soyacz added the backport/2024.1-done Commit backported to 2024.1 label Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/6.0 backport/6.1-done Commit backported to 6.1 backport/2024.1-done Commit backported to 2024.1 backport/2024.1 Need backport to 2024.1 backport/2024.2-done Commit backported to 2024.2 promoted-to-master test-integration Enable running the integration tests suite test-provision-aws Run provision test on AWS test-provision-azure test-provision-docker test-provision-gce Run provision test on GCE
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wait_db_up in docker doesn't work properly sometimes
4 participants