Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanconfig: investigate slowdown of combined-statements API with span configs #74919

Closed
Tracked by #75639
irfansharif opened this issue Jan 17, 2022 · 4 comments
Closed
Tracked by #75639
Labels
A-zone-configs C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Jan 17, 2022

Describe the problem

In #73876 I'm observing a few tests that make use of the combined statements API experience an order of magnitude slowdowns when span configs are enabled.

  • TestTenantCannotSeeNonTenantStats
  • TestStatusAPICombinedTransactions
  • TestStatusAPICombinedStatements

For now these tests opt out of the span configs infra by disabling the reconciliation job. This issue tracks investigation of the slowdown, and to remove the testing knob.

To Reproduce

With and without #73876, takes ~2s vs. ~20s after.

dev test pkg/server -f=TestStatusAPICombinedStatements -v --show-logs

Jira issue: CRDB-12320

@irfansharif irfansharif added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-zone-configs labels Jan 17, 2022
irfansharif added a commit to irfansharif/cockroach that referenced this issue Jan 18, 2022
It's unclear why these tests experience a massive slowdown.
Investigating it is tracked as part of cockroachdb#74919.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Jan 18, 2022
It's unclear why these tests experience a massive slowdown.
Investigating it is tracked as part of cockroachdb#74919.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Jan 18, 2022
This test got a lot slower under the span configs infrastructure,
something we're investigating as part of cockroachdb#74919. Until then, prevent
this test from spoiling builds.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Jan 18, 2022
This test got a lot slower under the span configs infrastructure,
something we're investigating as part of cockroachdb#74919. Until then, prevent
this test from spoiling builds.

Release note: None
@irfansharif
Copy link
Contributor Author

PS: I'm not actively investigating this. Letting it sit on the SQL Observability board to remind ourselves to address it during stability at the latest.

@irfansharif
Copy link
Contributor Author

$ dev test pkg/ccl/serverccl/statusccl -f=TestTenantCannotSeeNonTenantStats --stress
[..]
--- FAIL: TestTenantCannotSeeNonTenantStats (3.49s)
    test_log_scope.go:79: test logs captured to: /private/var/tmp/_bazel_irfansharif/99e666e4e674209ecdb66b46371278df/sandbox/darwin-sandbox/679/execroot/cockroach/_tmp/aa757d476f4d7d265feadcfd904a657a/logTestTenantCannotSeeNonTenantStats1556801609
    test_log_scope.go:80: use -show-logs to present logs inline
    --- FAIL: TestTenantCannotSeeNonTenantStats/overlap (0.00s)
        tenant_status_test.go:270:
                Error Trace:    tenant_status_test.go:270
                Error:          Should not be: serverpb.StatementsResponse_ExtendedCollectedTransactionStatistics{StatsData:roachpb.CollectedTransactionStatistics{StatementFingerprintIDs:[]roachpb.StmtFingerprintID{0x4d78e1ad463db1da}, App:"$ internal-expire-sessions", Stats:roachpb.TransactionStatistics{Count:1, MaxRetries:0, NumRows:roachpb.NumericStat{Mean:0, SquaredDiffs:0}, ServiceLat:roachpb.NumericStat{Mean:9.223372036854776e+09, SquaredDiffs:0}, RetryLat:roachpb.NumericStat{Mean:0, SquaredDiffs:0}, CommitLat:roachpb.NumericStat{Mean:1.3e-05, SquaredDiffs:0}, BytesRead:roachpb.NumericStat{Mean:0, SquaredDiffs:0}, RowsRead:roachpb.NumericStat{Mean:0, SquaredDiffs:0}, ExecStats:roachpb.ExecStats{Count:0, NetworkBytes:roachpb.NumericStat{Mean:0, SquaredDiffs:0}, MaxMemUsage:roachpb.NumericStat{Mean:0, SquaredDiffs:0}, ContentionTime:roachpb.NumericStat{Mean:0, SquaredDiffs:0}, NetworkMessages:roachpb.NumericStat{Mean:0, SquaredDiffs:0}, MaxDiskUsage:roachpb.NumericStat{Mean:0, SquaredDiffs:0}}, RowsWritten:roachpb.NumericStat{Mean:0, SquaredDiffs:0}}, AggregatedTs:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), TransactionFingerprintID:0xe21b5ce1c03c0605, AggregationInterval:0}, NodeID:1}
                Test:           TestTenantCannotSeeNonTenantStats/overlap
                Messages:       expected tenant to have no visibility to non-tenant's transaction stats, but found:%!(EXTRA serverpb.StatementsResponse_ExtendedCollectedTransactionStatistics={{[5582459872449769946] $ internal-expire-sessions {1 0 {0 0} {9.223372036854776e+09 0} {0 0} {1.3e-05 0} {0 0} {0 0} {0 {0 0} {0 0} {0 0} {0 0} {0 0}} {0 0}} 0001-01-01 00:00:00 +0000 UTC 16292718201605457413 0s} 1})
    tenant_status_test.go:286: -- test log scope end --
FAIL

Even outside span configs, I think some of these tests need de-flaking.

@irfansharif
Copy link
Contributor Author

For TestStatusAPICombined{Transactions,Statements} I ran the following with and without span configs (i.e. toggling ManagerDisableJobCreation).

$ dev test pkg/server -f=TestStatusAPICombined --stress --timeout 100s

They both seem to have the same number of total runs under stress. So perhaps something changed since this issue was filed? TestTenantCannotSeeNonTenantStats though I couldn't get past the real error above. List of things to do to close this issue:

  • Investigate stress failure above;
  • Get rid of ManagerDisableJobCreation in all these tests, I don't think we need it anymore.

@irfansharif
Copy link
Contributor Author

I'm just going to optimistically close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-zone-configs C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

No branches or pull requests

3 participants