-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli/interactive_tests: re-enable sql_mem_monitor test #106566
Conversation
81b7a5d
to
7da6666
Compare
This commit re-enables `sql_mem_monitor` CLI interactive test that has been skipped for some time. The test verifies that SQL memory accounting system prevents the server crashing when a memory intensive query is executed. In particular, it runs the query twice: - the first run has memory monitoring disabled, so the crash is expected - the second run sets low root SQL monitor limit and monitoring enabled, so the query errors out without crashing the server. I ran this test quite a few times and got a couple of failures, and each of them was due to the server not crashing on the first run. I believe this was because the test has rotted over time: the query is performing a cross join that can now spill to disk (it couldn't when the test was written), so it seems feasible that the server wouldn't crash as the test expects. This commit goes around this by effectively disabling the memory accounting system (it sets very high root SQL memory limit) as well as disk spilling (by setting very high `distsql_workmem` limit). This commit also adds a couple of other errors I saw into the allow-list of how the server can crash: - `_Cfunc_calloc` - seems reasonable enough - `fatal error: unexpected signal during runtime execution` might seem like it doesn't belong here, but it's an artifact of how we're limiting the memory usage of the CRDB process. In particular, ulimit is a "user limit" which is enforced not at the OS kernel level (i.e. not via oomkiller) but at the process execution level. As a result, memory allocation error doesn't kill the process, so it keeps on running and can hit this "fatal error" later on. Release note: None
7da6666
to
b14e21b
Compare
Got 16k runs with no failures on the gceworker, so it shouldn't be flaky anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
Reviewable status: complete! 0 of 0 LGTMs obtained
TFTR! bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from b14e21b to blathers/backport-release-23.1-106566: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This commit re-enables
sql_mem_monitor
CLI interactive test that has been skipped for some time. The test verifies that SQL memory accounting system prevents the server crashing when a memory intensive query is executed. In particular, it runs the query twice:I ran this test quite a few times and got a couple of failures, and each of them was due to the server not crashing on the first run. I believe this was because the test has rotted over time: the query is performing a cross join that can now spill to disk (it couldn't when the test was written), so it seems feasible that the server wouldn't crash as the test expects. This commit goes around this by effectively disabling the memory accounting system (it sets very high root SQL memory limit) as well as disk spilling (by setting very high
distsql_workmem
limit).This commit also adds a couple of other errors I saw into the allow-list of how the server can crash:
_Cfunc_calloc
- seems reasonable enoughfatal error: unexpected signal during runtime execution
might seem like it doesn't belong here, but it's an artifact of how we're limiting the memory usage of the CRDB process. In particular, ulimit is a "user limit" which is enforced not at the OS kernel level (i.e. not via oomkiller) but at the process execution level. As a result, memory allocation error doesn't kill the process, so it keeps on running and can hit this "fatal error" later on.Fixes: #106462.
Release note: None