Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: TestDistSQLFlowsVirtualTables failed #75208

Closed
cockroach-teamcity opened this issue Jan 20, 2022 · 17 comments · Fixed by #75638
Closed

sql: TestDistSQLFlowsVirtualTables failed #75208

cockroach-teamcity opened this issue Jan 20, 2022 · 17 comments · Fixed by #75638
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. skipped-test T-kv KV Team T-sql-queries SQL Queries Team

Comments

@cockroach-teamcity
Copy link
Member

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ fb72381cab272a4f03871032cc2195e97c11260a:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables3497067877
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables3497067877
--- FAIL: TestDistSQLFlowsVirtualTables (11.02s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:688: unexpected output from cluster_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.53s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Jan 20, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jan 20, 2022
@cockroach-teamcity
Copy link
Member Author

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ e8a0b75e227bf2b07207383f1ed7173da8321538:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables3697727178
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables3697727178
--- FAIL: TestDistSQLFlowsVirtualTables (11.54s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:688: unexpected output from cluster_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.48s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

@rytaft
Copy link
Collaborator

rytaft commented Jan 21, 2022

These are the last few lines of the log:

I220120 12:52:39.890028 316074 util/log/event_log.go:32 ⋮ [n1,client=‹127.0.0.1:60474›,hostssl,user=root] 190 ={"Timestamp":1642683159801575847,"EventType":"set_cluster_setting","Statement":"SET CLUSTER SETTING \"sql.distsql.max_running_flows\" = $1","Tag":"SET CLUSTER SETTING","User":"root","PlaceholderValues":["‹0›"],"SettingName":"sql.distsql.max_running_flows","Value":"‹0›"}
I220120 12:52:39.990773 316074 util/log/event_log.go:32 ⋮ [n1,client=‹127.0.0.1:60474›,hostssl,user=root] 191 ={"Timestamp":1642683159901505665,"EventType":"set_cluster_setting","Statement":"SET CLUSTER SETTING \"sql.distsql.flow_stream_timeout\" = $1","Tag":"SET CLUSTER SETTING","User":"root","PlaceholderValues":["‹'00:00:01'›"],"SettingName":"sql.distsql.flow_stream_timeout","Value":"‹00:00:01›"}
E220120 12:52:41.250379 317444 sql/flowinfra/flow_registry.go:324 ⋮ [n1,client=‹127.0.0.1:60474›,hostssl,user=root] 192  flow id:‹929e8e06-60bc-42ba-b968-077d1207abe4› : 2 inbound streams timed out after 1s; propagated error throughout flow
I220120 12:52:41.411669 317002 util/log/event_log.go:32 ⋮ [n1,client=‹127.0.0.1:60490›,hostssl,user=root] 193 ={"Timestamp":1642683161333710922,"EventType":"set_cluster_setting","Statement":"SET CLUSTER SETTING \"sql.distsql.max_running_flows\" = $1","Tag":"SET CLUSTER SETTING","User":"root","PlaceholderValues":["‹1›"],"SettingName":"sql.distsql.max_running_flows","Value":"‹1›"}
I220120 12:52:41.507941 316074 util/log/event_log.go:32 ⋮ [n1,client=‹127.0.0.1:60474›,hostssl,user=root] 194 ={"Timestamp":1642683161420521584,"EventType":"set_cluster_setting","Statement":"SET CLUSTER SETTING \"sql.distsql.flow_stream_timeout\" = $1","Tag":"SET CLUSTER SETTING","User":"root","PlaceholderValues":["‹'00:00:01'›"],"SettingName":"sql.distsql.flow_stream_timeout","Value":"‹00:00:01›"}
I220120 12:52:42.216627 313589 testutils/testcluster/testcluster.go:106 â‹® [-] 195  TestCluster quiescing nodes
W220120 12:52:42.230498 315880 sql/sqlliveness/slinstance/slinstance.go:224 â‹® [n3] 196  exiting heartbeat loop
W220120 12:52:42.233984 315462 sql/sqlliveness/slinstance/slinstance.go:224 â‹® [n2] 197  exiting heartbeat loop
W220120 12:52:42.237740 316007 jobs/registry.go:765 â‹® [n3] 198  canceling all adopted jobs due to stopper quiescing
I220120 12:52:42.239530 314093 kv/kvclient/kvcoord/transport_race.go:107 â‹® [-] 199  transport race promotion: ran 48 iterations on up to 1211 requests
W220120 12:52:42.239862 315544 jobs/registry.go:765 â‹® [n2] 200  canceling all adopted jobs due to stopper quiescing
W220120 12:52:42.243520 317685 kv/kvserver/intentresolver/intent_resolver.go:793 ⋮ [-] 201  failed to gc transaction record: could not GC completed transaction anchored at ‹/Table/SystemConfigSpan/Start›: ‹node unavailable; try another peer›
W220120 12:52:42.259703 313785 sql/sqlliveness/slinstance/slinstance.go:224 â‹® [n1] 202  exiting heartbeat loop
W220120 12:52:42.268864 314286 jobs/registry.go:765 â‹® [n1] 203  canceling all adopted jobs due to stopper quiescing
I220120 12:52:42.280746 314410 jobs/registry.go:1093 â‹® [n1] 204  AUTO SPAN CONFIG RECONCILIATION job 729457468389621761: stepping through state succeeded with error: <nil>
W220120 12:52:42.282767 314410 kv/txn.go:712 ⋮ [n1] 205  failure aborting transaction: ‹node unavailable; try another peer›; abort caused by: context canceled

Looks like the test timed out and span config reconciliation was ongoing. @irfansharif any chance #73876 could be related here?

@irfansharif
Copy link
Contributor

It could be, but those logs are also expected for end of test logging. The reconciliation job is an always running one that ends only when the server stopper is quiesced/stopped -- something that I believe is happening in that snippet.

@cockroach-teamcity
Copy link
Member Author

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ 58ceac139a7e83052171121b28026a7366f16f7e:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables3306148812
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables3306148812
--- FAIL: TestDistSQLFlowsVirtualTables (11.65s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:688: unexpected output from cluster_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.58s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ 4cb2efda85aa56f1c42129fef21c2a46b8465291:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables2708899069
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables2708899069
--- FAIL: TestDistSQLFlowsVirtualTables (10.35s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:702: unexpected output from node_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.43s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ 6264a82d276489bc11ad87bc98fd237319501c9f:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables1831888122
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables1831888122
--- FAIL: TestDistSQLFlowsVirtualTables (10.85s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:702: unexpected output from node_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.44s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ 22f45e597c8f698a556f2ae32ff68e4c0327e97a:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables3927001344
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables3927001344
--- FAIL: TestDistSQLFlowsVirtualTables (11.14s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:688: unexpected output from cluster_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.50s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ 824ee6a735f0cd2e693c2eafcaf4e5239596b31f:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables1928909938
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables1928909938
--- FAIL: TestDistSQLFlowsVirtualTables (11.05s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:702: unexpected output from node_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.48s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

@blathers-crl blathers-crl bot added the T-kv KV Team label Jan 25, 2022
@cucaroach
Copy link
Contributor

@irfansharif is this another test where we can utilize some retry magic?

@cockroach-teamcity
Copy link
Member Author

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ 81c447ddb47e33d66f59acbcb1f76c37f5a148f6:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables3907735073
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables3907735073
--- FAIL: TestDistSQLFlowsVirtualTables (10.28s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:702: unexpected output from node_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.46s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ e27baa02ee9abc50d4445876e88c07181be77b55:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables1650312602
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables1650312602
--- FAIL: TestDistSQLFlowsVirtualTables (12.47s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:688: unexpected output from cluster_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.81s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ ae101ea32d99a9142a319b0a1f6850ee76d55cd9:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables1354185304
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables1354185304
--- FAIL: TestDistSQLFlowsVirtualTables (10.04s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:702: unexpected output from node_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.38s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ 819cc5e7a0642ebbb2872541cd227c79c5025d75:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables4104142260
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables4104142260
--- FAIL: TestDistSQLFlowsVirtualTables (9.90s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:688: unexpected output from cluster_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.40s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

lidorcarmel added a commit to lidorcarmel/cockroach that referenced this issue Jan 27, 2022
See cockroachdb#75208

Failed a bunch of time this week.

Release note: None
lidorcarmel added a commit that referenced this issue Jan 27, 2022
Refs: #75208

Reason: flaky test under race

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None
@cockroach-teamcity
Copy link
Member Author

sql.TestDistSQLFlowsVirtualTables failed with artifacts on master @ fa93c68218af8094f24f421d233ee9d344b7303c:

=== RUN   TestDistSQLFlowsVirtualTables
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables2055223954
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/2 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1/1 and waiting for test to unblock...
    crdb_internal_test.go:550: stalling on scan at /Table/56/1 and waiting for test to unblock...
=== CONT  TestDistSQLFlowsVirtualTables
    crdb_internal_test.go:721: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestDistSQLFlowsVirtualTables2055223954
--- FAIL: TestDistSQLFlowsVirtualTables (10.33s)
=== RUN   TestDistSQLFlowsVirtualTables/MaxRunningFlows=0
    crdb_internal_test.go:641: waiting for remote flows to be scheduled or run
    crdb_internal_test.go:659: checking the virtual tables
    crdb_internal_test.go:702: unexpected output from node_distsql_flows on node 3
    --- FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.41s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

craig bot pushed a commit that referenced this issue Jan 27, 2022
75619: physicalplan: multi-stage for regr_sxx, regr_sxy, regr_syy r=yuzefovich a=mneverov

physicalplan: add support for multi-stage execution of regr_sxx, regr_sxy, regr_syy.

See #58347.

Release note (performance improvement): regr_sxx, regr_sxy, regr_syy aggregate
functions are now evaluated more efficiently in a distributed setting.

75626: sql: skip TestDistSQLFlowsVirtualTables r=lidorcarmel a=lidorcarmel

Refs: #75208

Reason: flaky test

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None

75629: batcheval: ignore `MVCCStats.LastUpdateNanos` in `AddSSTable` assertions r=dt a=erikgrinaker

Release note: None

Co-authored-by: Max Neverov <neverov.max@gmail.com>
Co-authored-by: Lidor Carmel <lidor@cockroachlabs.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
@irfansharif
Copy link
Contributor

Finding this hard to repro, despite what it seems are very frequent failures. Even tried under race which seems like the dominant (only?) failure mode. Will let it run for a while longer before giving up and trying speculative fixes.

@irfansharif
Copy link
Contributor

No luck. Ran dev test pkg/sql -f=TestDistSQLFlowsVirtualTables -v --stress --race --show-logs --stress-args '-p 4' for an hour on my GCE worker.

@irfansharif
Copy link
Contributor

Here's my uneducated guess. This test only fails under race for the MaxRunningFlows=0 variant, and I think we expect the query we fire in the test to basically fail ~ soon:

// Spin up a separate goroutine that will run the query. If
// maxRunningFlows is 0, the query eventually will error out because
// the remote flows don't connect in time; if maxRunningFlows is 1,
// the query will succeed once we close 'unblock' channel.
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
g := ctxgroup.WithContext(ctx)
g.GoCtx(func(ctx context.Context) error {
conn := tc.ServerConn(gatewayNodeID)
atomic.StoreInt64(&queryRunningAtomic, 1)
_, err := conn.ExecContext(ctx, query)
atomic.StoreInt64(&queryRunningAtomic, 0)
return err
})

At first we do assert that it's queued up at the right places, looking at the DistSQLServer directly:

numQueued := s.NumRemoteFlowsInQueue()
if numQueued != 1-maxRunningFlows {
return errors.Errorf("%d flows are found in the queue of node %d, %d expected", numQueued, idx+1, 1-maxRunningFlows)
}
numRunning := s.NumRemoteRunningFlows()
if numRunning != maxRunningFlows {
return errors.Errorf("%d flows are found in the queue of node %d, %d expected", numRunning, idx+1, maxRunningFlows)
}

Further below we check that it appears as such from remote nodes when looking it up through the vtable. These are what get tripped:

expRunning, expQueued := 0, 2
if maxRunningFlows == 1 {
expRunning, expQueued = expQueued, expRunning
}
if getNum(db, clusterScope, runningStatus) != expRunning || getNum(db, clusterScope, queuedStatus) != expQueued {
t.Fatalf("unexpected output from cluster_distsql_flows on node %d", nodeID+1)
}

expRunning, expQueued = 0, 1
if maxRunningFlows == 1 {
expRunning, expQueued = expQueued, expRunning
}
if getNum(db, nodeScope, runningStatus) != expRunning || getNum(db, nodeScope, queuedStatus) != expQueued {
t.Fatalf("unexpected output from node_distsql_flows on node %d", nodeID+1)
}

I think it's actually pretty interesting that in the failures above, we've seeing both of these assertions get tripped (especially the second one, always on n3). If we're seeing test runs where the first assertion doesn't trip, but the second one does, I think we should take that as evidence that the query in the test can error out concurrently with these assertions being made. I think we should relax these assertions accordingly -- there's nothing preventing the query from erroring out/getting de-queued.

craig bot pushed a commit that referenced this issue Jan 28, 2022
75638: sql: de-flake TestDistSQLFlowsVirtualTables r=irfansharif a=irfansharif

Fixes #75208 (speculatively). Wasn't able to repro the original failure,
so here's a blind stab at fixing it. See [this](github.com//issues/75208#issuecomment-1023800242) for the theory; I think
it's possible for the query this test fires off to error out + get
dequeued concurrently with when the assertions are made about it being
enqueued. We should relax our assertions accordingly.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
craig bot pushed a commit that referenced this issue Jan 28, 2022
75638: sql: de-flake TestDistSQLFlowsVirtualTables r=irfansharif a=irfansharif

Fixes #75208 (speculatively). Wasn't able to repro the original failure,
so here's a blind stab at fixing it. See [this](github.com//issues/75208#issuecomment-1023800242) for the theory; I think
it's possible for the query this test fires off to error out + get
dequeued concurrently with when the assertions are made about it being
enqueued. We should relax our assertions accordingly.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@craig craig bot closed this as completed in eb2a5cf Jan 28, 2022
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. skipped-test T-kv KV Team T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants