Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/sql/sqlitelogictest/tests/local/local_test: TestSqlLiteLogic_testindexorderby_nosort1000slt_good_0_test failed #89635

Closed
cockroach-teamcity opened this issue Oct 10, 2022 · 12 comments · Fixed by #90095
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-queries SQL Queries Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Oct 10, 2022

pkg/sql/sqlitelogictest/tests/local/local_test.TestSqlLiteLogic_testindexorderby_nosort1000slt_good_0_test failed with artifacts on release-22.2.0 @ 776e9f889dec6fec9b4011e82c523c648b72254e:

Slow failing tests:
TestSqlLiteLogic_testindexorderby_nosort1000slt_good_0_test - 2066.35s

Slow passing tests:
TestSqlLiteLogic_testindexorderby1000slt_good_0_test - 4201.78s
TestSqlLiteLogic_testindexdelete100slt_good_2_test - 297.14s
TestSqlLiteLogic_testindexin1000slt_good_0_test - 230.81s
TestSqlLiteLogic_testindexbetween100slt_good_3_test - 141.49s
TestSqlLiteLogic_testindexcommute10slt_good_20_test - 56.23s
TestSqlLiteLogic_testindexorderby10slt_good_21_test - 55.76s
TestSqlLiteLogic_testindexcommute100slt_good_7_test - 54.20s
TestSqlLiteLogic_testindexcommute10slt_good_4_test - 49.25s
TestSqlLiteLogic_testindexorderby_nosort10slt_good_22_test - 21.47s
TestSqlLiteLogic_testindexorderby_nosort10slt_good_37_test - 20.52s
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

Jira issue: CRDB-20345

@cockroach-teamcity cockroach-teamcity added branch-release-22.2.0 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Oct 10, 2022
@cockroach-teamcity cockroach-teamcity added this to the 22.2 milestone Oct 10, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Oct 10, 2022
@msirek
Copy link
Contributor

msirek commented Oct 11, 2022

I ran the test via this command, but it timed out after 1 hour, so need to increase the timeout:

make testlogic PKG=../sqllogictest/test TESTFLAGS="-bigtest -test.v" TESTS="TestSqlLiteLogic/local/testindexorderby_nosort1000slt_good_0_test"

@msirek
Copy link
Contributor

msirek commented Oct 11, 2022

I found a memory budget exceeded error in the logs:

          logic.go:2762: 
               pq: streamer budget: memory budget exceeded: 35136 bytes requested, 57344 currently allocated, 76801 bytes in budget
      [07:41:17] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/3668/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/sql/sqlitelogictest/tests/local-vec-off/local-vec-off_test_/local-vec-off_test.runfiles/com_github_cockroachdb_cockroach/external/com_github_cockroachdb_sqllogictest/test/index/orderby_nosort/1000/slt_good_1.test with config local-vec-off: 5297 tests, 1 failures
          logic.go:3806: 
              /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/3668/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/sql/sqlitelogictest/tests/local-vec-off/local-vec-off_test_/local-vec-off_test.runfiles/com_github_cockroachdb_cockroach/external/com_github_cockroachdb_sqllogictest/test/index/orderby_nosort/1000/slt_good_1.test:23651: error while processing
          logic.go:3806: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/3668/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/sql/sqlitelogictest/tests/local-vec-off/local-vec-off_test_/local-vec-off_test.runfiles/com_github_cockroachdb_cockroach/external/com_github_cockroachdb_sqllogictest/test/index/orderby_nosort/1000/slt_good_1.test:23651: too many errors encountered, skipping the rest of the input
          panic.go:522: -- test log scope end --
      test logs left over in: /artifacts/tmp/_tmp/601ec2e9eb9f63c0f7391fb9b965834f/logTestSqlLiteLogic_testindexorderby_nosort1000slt_good_1_test1800528041
      --- FAIL: TestSqlLiteLogic_testindexorderby_nosort1000slt_good_1_test (55.91s)

I believe the test could be run with this command:

make testlogic PKG=../sqllogictest/test TESTFLAGS="-bigtest -test.v" TESTS="TestSqlLiteLogic/local-vec-off/TestSqlLiteLogic_testindexorderby_nosort1000slt_good_1_test"

I tried running it once, but with the wrong config, and it timed out after an hour.

@cucaroach
Copy link
Contributor

Do we need to have the force production sizes logictest mode beef up memory for these tests?

@cockroach-teamcity
Copy link
Member Author

pkg/sql/sqlitelogictest/tests/local/local_test.TestSqlLiteLogic_testindexorderby_nosort1000slt_good_0_test failed with artifacts on release-22.2.0 @ 79756fd7476c970f763661a63824beaedc339bd3:

Slow failing tests:
TestSqlLiteLogic_testindexorderby_nosort1000slt_good_0_test - 1690.64s

Slow passing tests:
TestSqlLiteLogic_testindexorderby1000slt_good_0_test - 4755.97s
TestSqlLiteLogic_testindexdelete100slt_good_2_test - 226.87s
TestSqlLiteLogic_testindexin1000slt_good_0_test - 181.48s
TestSqlLiteLogic_testindexbetween100slt_good_3_test - 111.02s
TestSqlLiteLogic_testindexorderby10slt_good_21_test - 55.52s
TestSqlLiteLogic_testindexcommute10slt_good_20_test - 43.52s
TestSqlLiteLogic_testindexcommute100slt_good_7_test - 42.15s
TestSqlLiteLogic_testindexcommute10slt_good_4_test - 38.48s
TestSqlLiteLogic_testindexorderby_nosort10slt_good_22_test - 25.07s
TestSqlLiteLogic_testindexorderby_nosort10slt_good_37_test - 24.29s
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cucaroach
Copy link
Contributor

Finally got a failure on 3rd try running this:

./dev  test pkg/sql/sqlitelogictest/tests/fakedist-vec-off -f TestSqlLiteLogic_testindexorderby1000slt_good_0_test -v --ignore-cache -- --test_arg -bigtest --test_arg -flex-types --config=crdb_test_off

@cucaroach
Copy link
Contributor

cucaroach commented Oct 17, 2022

So the big question here is why is this failure only happening on 22.2.0 branch? On master and release-22.2 this test passes even when workmem is 18kb so there must be something in the streamer that wasn't backported that fixes this?

@cucaroach
Copy link
Contributor

Okay false alarm, this fails on 22.2 and master as well I was just tricked by how flakey it is.

cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 17, 2022
Below ~70k these tests would flake like so:

```
pq: streamer budget: memory budget exceeded: 38112 bytes requested, 57344 currently allocated, 76801 bytes in budget
```

Fixes: cockroachdb#89635

Release note: None
@michae2
Copy link
Collaborator

michae2 commented Oct 18, 2022

Something is still bothering me here. Shouldn't the streamer be able to make progress while keeping memory usage below whatever budget is set? (With special behavior for individual rows that exceed the budget) I feel like there is an underlying streamer bug here of some kind.

@cucaroach
Copy link
Contributor

Something is still bothering me here. Shouldn't the streamer be able to make progress while keeping memory usage below whatever budget is set? (With special behavior for individual rows that exceed the budget) I feel like there is an underlying streamer bug here of some kind.

There is something weird going on because even if I force the workmem limit to something low, it doesn't fail every time, and crdb_test is off. Not sure how to debug, I'll keep banging on it.

@mgartner
Copy link
Collaborator

Do we think this is a release blocker?

@cucaroach
Copy link
Contributor

I looked at it enough to convince myself that the problem is the row based join reader thats throwing hundreds of spans at the streamer. From Enqueue doc comments:

// It is the caller's responsibility to ensure that the memory footprint of reqs
// (i.e. roachpb.Spans inside of the requests) is reasonable. Enqueue will
// return an error if that footprint exceeds the Streamer's limitBytes. The
// exception is made only when a single request is enqueued in order to allow
// the caller to proceed when the key to lookup is arbitrarily large. As a rule
// of thumb though, the footprint of reqs should be on the order of MBs, and not
// tens of MBs.

I think its working by design and we should just bump up the limit but also feel like Yahor would know better.

craig bot pushed a commit that referenced this issue Oct 18, 2022
89851: roachtest: avoid using sudo to install python packages r=e-mbrown a=rafiss

see also #88675

sudo can make it confusing which veresion of a package is being installed and used. This fixes the usage of virtualenvs, and avoid sudo.

This also removes per-version blocklists for python tools' tests.

Release note: None

90095: sql: bump workmem size to avoid sqllite flakes on index/1000 tests r=cucaroach a=cucaroach

Below ~70k these tests would flake like so:

```
pq: streamer budget: memory budget exceeded: 38112 bytes requested, 57344 currently allocated, 76801 bytes in budget
```

Fixes: #89635

Release note: None


Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Tommy Reilly <treilly@cockroachlabs.com>
@craig craig bot closed this as completed in ba4e08f Oct 18, 2022
blathers-crl bot pushed a commit that referenced this issue Oct 18, 2022
Below ~70k these tests would flake like so:

```
pq: streamer budget: memory budget exceeded: 38112 bytes requested, 57344 currently allocated, 76801 bytes in budget
```

Fixes: #89635

Release note: None
@michae2
Copy link
Collaborator

michae2 commented Oct 18, 2022

I think its working by design and we should just bump up the limit but also feel like Yahor would know better.

Ok, sounds good. Thanks for looking at it.

cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 18, 2022
Below ~70k these tests would flake like so:

```
pq: streamer budget: memory budget exceeded: 38112 bytes requested, 57344 currently allocated, 76801 bytes in budget
```

Fixes: cockroachdb#89635

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants