Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large inserts stress tests with 4 inserting-threads, and background threads, runs into Assertion from trunk_get_new_bundle() : (hdr->end_bundle != hdr->start_bundle)". [3363] No available bundles in trunk node. #474

Open
gapisback opened this issue Nov 7, 2022 · 3 comments

Comments

@gapisback
Copy link
Collaborator

gapisback commented Nov 7, 2022

This issue was unearthed while trying to stabilize newly developed large_inserts_bugs_stress_test with support for background threads enabled. Background threads support is being stabilized and enabled through test under in-flight issue #469 .

This bug can be reproduced using that feature enablement, testing support and this new test case from the agurajada/474-trunk_get_new_bundle-Assert-w-bg-threads branch.

$ build/debug/bin/unit/large_inserts_bugs_stress_test --num-inserts 20000000 --num-threads 4 --num-bg-threads 2 --num-memtable-bg-threads 2 test_seq_key_seq_values_inserts_threaded
Running 1 CTests, suite name 'large_inserts_bugs_stress', test case 'test_seq_key_seq_values_inserts_threaded'.
[...]
Assertion failed at src/trunk.c:2003:trunk_get_new_bundle_no(): "(hdr->end_bundle != hdr->start_bundle)". [3363] No available bundles in trunk node. page disk_addr=524288, end_bundle=7, start_bundle=7
Aborted (core dumped)

The problem reproduces albeit a bit sporadically. It is certainly triggered by the use of new background thread-related testing config parameters: i.e., --num-bg-threads 2 --num-memtable-bg-threads 2.

All test cases in this new test run cleanly for large #s of inserts per thread w/o the use of background threads. I have exercised this test case for --num-inserts 20000000 (20 million inserts / thread) and it runs cleanly.


From code inspection, this assertion is occurring from L3363, as reported by the enhanced assert message:

3325 static void
3326 trunk_memtable_incorporate(trunk_handle  *spl,
[...]
3357    /*
3358     * X. Get a new branch in a bundle for the memtable
3359     */
3360    trunk_compacted_memtable *cmt =
3361       trunk_get_compacted_memtable(spl, generation);
3362    trunk_compact_bundle_req *req = cmt->req;
3363    req->bundle_no                = trunk_get_new_bundle(spl, root);

Given that the assert is in the code path of trunk_memtable_incorporate(), it's a likely possibility that this is induced by the use of the --num-memtable-bg-threads 2 argument, which creates 2 threads of type TASK_TYPE_MEMTABLE.

I tried to run this w/o this arg, and with just the other --num-bg-threads 2 argument. And this consistently runs into the hang-issue reported under issue #475 .

@gapisback
Copy link
Collaborator Author

Note, this stress test will cough up different errors related to pack or bundle / compaction.

Here is an example when running this test on a beefy AWS instance:

$ ip-172-31-29-15:[47] $ build/release/bin/unit/large_inserts_bugs_stress_test --num-inserts 20000000 --num-threads 4 test_seq_key_seq_values_inserts_threaded
Running 1 CTests, suite name 'large_inserts_bugs_stress', test case 'test_seq_key_seq_values_inserts_threaded'.
TEST 1/2 large_inserts_bugs_stress:test_seq_key_seq_values_inserts_threaded Fingerprint size 29 too large, max value size is 5, setting to 27
fingerprint_size: 27
filter-index-size: 256 is too small, setting to 512
[...]
[OK]
TEST 2/2 large_inserts_bugs_stress:test_seq_key_seq_values_inserts_threaded_same_start_keyid Fingerprint size 29 too large, max value size is 5, setting to 27
fingerprint_size: 27
filter-index-size: 256 is too small, setting to 512
exec_worker_thread()::495:Thread 1  inserts 20000000 (20 million), sequential key, seqential value, KV-pairs starting from 0 (0) ...
OS-pid=141555, Thread-ID=1, Insert small-width sequential values of different lengths.
exec_worker_thread()::495:Thread 2  inserts 20000000 (20 million), sequential key, seqential value, KV-pairs starting from 0 (0) ...
OS-pid=141555, Thread-ID=2, Insert small-width sequential values of different lengths.
exec_worker_thread()::495:Thread 3  inserts 20000000 (20 million), sequential key, seqential value, KV-pairs starting from 0 (0) ...
OS-pid=141555, Thread-ID=3, Insert small-width sequential values of different lengths.
exec_worker_thread()::495:Thread 4  inserts 20000000 (20 million), sequential key, seqential value, KV-pairs starting from 0 (0) ...
OS-pid=141555, Thread-ID=4, Insert small-width sequential values of different lengths.
btree_pack exceeded output size limit
Assertion failed at src/trunk.c:4849:trunk_compact_bundle(): "SUCCESS(pack_status)". platform_status of btree_pack: 28

Aborted (core dumped)

@rosenhouse
Copy link
Member

rosenhouse commented Nov 15, 2022

In the team meeting today, Alex suggested that a workaround may be to increase the number of background threads compared to insert threads. If that is enough to fix this, please also consider adding a more helpful message to the assertion, to guide future users who might hit the same problem.

@gapisback
Copy link
Collaborator Author

Based on an earlier triage of this during PR meeting, the suspicion is that we are running into this issue as the # of bg-threads configured for Memtable work is too low (2) compared to the # of inserting threads, 4. It's basically some sort of a producer-consumer queues issue (as described by @ajhconway): Inserting threads are putting in more keys into the Memtable faster than the memtable bg-threads can do incorporation.

One way to w/a this issue, as suggested above, is to run this [new] test case with more Memtable bg-threads.

Also, it's likely that we may not run into this issue once PR #497 lands in /main. That change re-works the way fg- and bg-threads cooperate / co-exist to do this kind of Memtable management, so that bg-threads are not overloaded by a large # of inserting fg-threads.

Re-visit this test case once that PR lands in /main to see if we can still reproduce this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants