Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: Put occasionally fails with scp error #91771

Closed
renatolabs opened this issue Nov 11, 2022 · 6 comments
Closed

roachprod: Put occasionally fails with scp error #91771

renatolabs opened this issue Nov 11, 2022 · 6 comments
Labels
A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-testeng TestEng Team

Comments

@renatolabs
Copy link
Collaborator

renatolabs commented Nov 11, 2022

We've been seeing an increased rate of scp related errors recently when running roachtests. It seems like this is especially more pronounced when roachtest is running multiple tests in parallel (and the test runner is copying large binaries to multiple clusters at the same time).

According to @yuzefovich, the issue can happen with very high probability when running the following command:

./bin/roachtest run -u yahor tpch_concurrency\$\$ --cockroach=artifacts/cockroach-short --workload=artifacts/workload --count=25 -p=5 --cpu-quota=1024

This has also been observed in roachtest failures, although at a lower rate (e.g., #91204).

Slack discussion (internal)

Updates:

  • Issue seems to happen mostly when scp'ing from OSX to the roachprod cluster. Happens much less frequently when running a test from a gceworker.

Jira issue: CRDB-21419

@renatolabs renatolabs added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure T-testeng TestEng Team labels Nov 11, 2022
@blathers-crl
Copy link

blathers-crl bot commented Nov 11, 2022

cc @cockroachdb/test-eng

@smg260
Copy link
Contributor

smg260 commented Nov 14, 2022

I didn’t encounter any scp errors running the tpch_concurrency roachtest above from an m1 or gce-worker. Even tried just a basic roachtest which just did (over and over)

c.Put(ctx, t.Cockroach(), "./cockroach", c.Range(1, 4-1))
c.Put(ctx, t.DeprecatedWorkload(), "./workload", c.Node(4))

There have, however, been scp flakes in a few other roachtests recently.

@yuzefovich could there be some interfering configuration on your machine?

@yuzefovich
Copy link
Member

Hm, I'm not sure, I don't think I have much custom configuration set up. I'll send you in a DM my bash config but otherwise I didn't make any changes.

@yuzefovich
Copy link
Member

Today I’m in the SF office, and I’m no longer hitting the problems with uploading binaries from my mac - it seems like the problem is likely issues with my ISP (comcast) - I tried a test with 20 concurrent runs, and all uploads were successful.

@srosenberg
Copy link
Member

@smg260 Are we still seeing these or has "ssh retries" eliminated these flakes?

@smg260
Copy link
Contributor

smg260 commented Jan 12, 2023

This can be closed. Problem was isolated to the ISP/home network. In any case, SCP retries are implemented too.

@smg260 smg260 closed this as completed Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-testeng TestEng Team
Projects
None yet
Development

No branches or pull requests

4 participants