Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: deflake TestTrace #99062

Merged

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Mar 20, 2023

This has been seen to flake in CI:

=== RUN   TestTrace/ShowTraceForVectorized/TracingOff/node-1
    trace_test.go:386: expected span: "session recording", got: "pendingLeaseRequest: requesting lease"
    trace_test.go:397: remaining span: "session recording"
    trace_test.go:397: remaining span: "sql query"
    trace_test.go:397: remaining span: "sql txn"
    trace_test.go:397: remaining span: "txn coordinator send"
            --- FAIL: TestTrace/ShowTraceForVectorized/TracingOff/node-1 (0.02s)

There was already an exception for this span, but with a storage. prefix. This patch removes the prefix, and makes it match on substrings.

This flake has possibly been made worse with the introduction of a metamorphic setting to only use expiration-based leases in ecc931b.

Resolves #98971.

Epic: none
Release note: None

@erikgrinaker erikgrinaker requested review from a team March 20, 2023 20:45
@erikgrinaker erikgrinaker self-assigned this Mar 20, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker erikgrinaker force-pushed the test-trace-pending-lease-request branch from 8bc7255 to 6aa3eb6 Compare March 20, 2023 20:46
Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker)

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker)


-- commits line 16 at r1:
This suggests that the flake was possible before but it was made more likely recently, right?


pkg/sql/trace_test.go line 385 at r1 (raw file):

									t.Errorf("extra span: %s", op)
									remainingErr = true
								} else if op != test.expSpans[r] {

I wonder whether we should change this comparison from exact equality to strings.Contains and perhaps also remove storage.Store prefix from one of other always optional spans.

@aadityasondhi
Copy link
Collaborator

Would this fix #98971?

cc @abarganier

@erikgrinaker
Copy link
Contributor Author

Yes.

This has been seen to flake in CI:

```
=== RUN   TestTrace/ShowTraceForVectorized/TracingOff/node-1
    trace_test.go:386: expected span: "session recording", got: "pendingLeaseRequest: requesting lease"
    trace_test.go:397: remaining span: "session recording"
    trace_test.go:397: remaining span: "sql query"
    trace_test.go:397: remaining span: "sql txn"
    trace_test.go:397: remaining span: "txn coordinator send"
            --- FAIL: TestTrace/ShowTraceForVectorized/TracingOff/node-1 (0.02s)
```

There was already an exception for this span, but with a `storage.`
prefix. This patch removes the prefix, and makes it match on substrings.

This flake has possibly been made worse with the introduction of a
metamorphic setting to only use expiration-based leases in ecc931b.

Epic: none
Release note: None
@erikgrinaker erikgrinaker force-pushed the test-trace-pending-lease-request branch from 6aa3eb6 to c7641af Compare March 20, 2023 21:06
Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)


-- commits line 16 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This suggests that the flake was possible before but it was made more likely recently, right?

No idea, but presumably the ignore rule is there for a reason.


pkg/sql/trace_test.go line 385 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I wonder whether we should change this comparison from exact equality to strings.Contains and perhaps also remove storage.Store prefix from one of other always optional spans.

Yeah, may as well, done. Let's see what CI says.

@erikgrinaker
Copy link
Contributor Author

I'm calling it a night -- feel free to bors this for me on green CI.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

bors r+

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @erikgrinaker)

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: - looking into CI failures.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @erikgrinaker)

@abarganier
Copy link
Contributor

TestTenantUpgradeInterlock is failing on this PR.

I see this was merged 7 hours ago - a fix to increase the timeout with a specific mention of TestTenantUpgradeInterlock in the PR description.

I'm going to check to see if this SHA is included on this branch. If not, I'll push a rebase.

@yuzefovich
Copy link
Member

No need to push the rebase to this branch - bors automatically picks up the newest changes and rebases on top of them the batch that it attempts to merge. Thus, although the CI is red on this PR (because this branch doesn't contain the fix), but the fix has already been merged, then the CI run when borsing should succeed.

@abarganier
Copy link
Contributor

@yuzefovich I see, thanks for clarifying. I didn't realize a bors would bypass CI failures - TIL!

@yuzefovich
Copy link
Member

Just to clarify a bit further - the way bors works is as follows:

  • pick one or more "borsed" PRs into a batch
  • checkout the tip of master and cherry-pick all PRs from the batch onto that branch
  • if there is a merge conflict, then the batch fails
  • if there are no merge conflicts, then bors kicks off the Bazel Essential CI run on staging branch
  • whether that CI run succeeds or fails determines whether the batch of PRs is merged to master.

With such setup it doesn't matter whether CI runs on the PRs on their own were successful or not (they are not marked as "required"). (The only required are currently two things: 1) there is at least one GitHub approval on the PR, and 2) the PR is "borsed".)

@craig
Copy link
Contributor

craig bot commented Mar 21, 2023

Build succeeded:

@craig craig bot merged commit 3f960e1 into cockroachdb:master Mar 21, 2023
@erikgrinaker erikgrinaker deleted the test-trace-pending-lease-request branch March 21, 2023 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: TestTrace failed
6 participants