-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug assert fails in routing_filter_add(), "(index_no / addrs_per_page < pages_per_extent)": Causes large inserts workload to fail. #560
Comments
…outing_filter_add(). Test FAILs. Seems like this is the 1st commit where stuff starts to break: sdb-fdb-build:[39] $ VERBOSE=6 ./build/debug/bin/unit/large_inserts_stress_test --num-inserts 20000000 --verbose-progress test_560_seq_htobe32_key_random_6byte_values_inserts Running 1 CTests, suite name 'large_inserts_stress', test case 'test_560_seq_htobe32_key_random_6byte_values_inserts'. TEST 1/1 large_inserts_stress:test_560_seq_htobe32_key_random_6byte_values_inserts Fingerprint size 29 too large, max value size is 5, setting to 27 fingerprint_size: 27 filter-index-size: 256 is too small, setting to 512 exec_worker_thread()::293:Thread 0 inserts 20000000 (20 million), sequential key, random value, KV-pairs starting from 0 (0) ... OS-pid=430835, Thread-ID=0, Insert random value of fixed-length=6 bytes. exec_worker_thread()::385:Thread-0 Inserted 1 million KV-pairs ... exec_worker_thread()::385:Thread-0 Inserted 2 million KV-pairs ... exec_worker_thread()::385:Thread-0 Inserted 3 million KV-pairs ... [...] exec_worker_thread()::385:Thread-0 Inserted 17 million KV-pairs ... OS-pid=430835, OS-tid=430835, Thread-ID=0, Assertion failed at src/routing_filter.c:591:routing_filter_add(): "(index_no / addrs_per_page < pages_per_extent)". index_no=16384, addrs_per_page=512, (index_no / addrs_per_page)=32, pages_per_extent=32 Aborted (core dumped)
…outing_filter_add(). Test works Seems like this is the 1st commit where stuff works: sdb-fdb-build:[30] $ VERBOSE=6 ./build/debug/bin/unit/large_inserts_stress_test --num-inserts 20000000 --verbose-progress test_560_seq_htobe32_key_random_6byte_values_inserts Running 1 CTests, suite name 'large_inserts_stress', test case 'test_560_seq_htobe32_key_random_6byte_values_inserts'. TEST 1/1 large_inserts_stress:test_560_seq_htobe32_key_random_6byte_values_inserts Fingerprint size 29 too large, max value size is 5, setting to 27 fingerprint_size: 27 filter-index-size: 256 is too small, setting to 512 exec_worker_thread()::293:Thread 0 inserts 20000000 (20 million), sequential key, random value, KV-pairs starting from 0 (0) ... OS-pid=429442, Thread-ID=0, Insert random value of fixed-length=6 bytes. exec_worker_thread()::385:Thread-0 Inserted 1 million KV-pairs ... exec_worker_thread()::385:Thread-0 Inserted 2 million KV-pairs ... exec_worker_thread()::385:Thread-0 Inserted 3 million KV-pairs ... exec_worker_thread()::385:Thread-0 Inserted 4 million KV-pairs ... [...] exec_worker_thread()::385:Thread-0 Inserted 19 million KV-pairs ... exec_worker_thread()::385:Thread-0 Inserted 20 million KV-pairs ... exec_worker_thread()::400:Thread-0 Inserted 20 million KV-pairs in 140 s, 142857 rows/s Allocated at unmount: 880 MiB [OK] RESULTS: 1 tests (1 ok, 0 failed, 0 skipped) ran in 141608 ms
Have narrowed down the repro as follows: Re-applied relevant commits from
Debugging ... Updated: (4/1/2023): I pushed branch agurajada/rf-add-fc897e4c-fail, which adds couple more variations of the basic test case:
You have to run these with large # of inserts, say 20M e.g.
|
Here's a really easy diff against $ git diff
diff --git a/tests/unit/splinterdb_stress_test.c b/tests/unit/splinterdb_stress_test.c
index 87cf659..32d7470 100644
--- a/tests/unit/splinterdb_stress_test.c
+++ b/tests/unit/splinterdb_stress_test.c
@@ -18,8 +18,8 @@
#include "../functional/random.h"
#include "ctest.h" // This is required for all test-case files.
-#define TEST_KEY_SIZE 20
-#define TEST_VALUE_SIZE 116
+#define TEST_KEY_SIZE 4
+#define TEST_VALUE_SIZE 6
// Function Prototypes
static void *
@@ -73,7 +73,7 @@ CTEST2(splinterdb_stress, test_random_inserts_concurrent)
ASSERT_TRUE(random_data >= 0);
worker_config wcfg = {
- .num_inserts = 1000 * 1000,
+ .num_inserts = 20 * 1000 * 1000,
.random_data = random_data,
.kvsb = data->kvsb,
}; |
We did some more experimentation today and it appears tuples with 4b key, 20b value will pass, but smaller will not. |
Here is some more evidence of the behaviours we see for this existing stress-test when run off of For a single-thread execution:
The test progresses to about:
The same test with the same above config will pass for this combo (again single thread): --- Re-run with 4-threads, key=4 bytes, value=16 bytes: This will run for a while till here:
And then will hang in this stack:
|
This specific problem was encountered during benchmarking of single-client inserting 20M rows (in PG-SplinterDB integration code base).
Update (agurajada; 4/2023) After deeper investigations of variations of this repro, the basic issue seems to be that there is some instability in the library in
/main
when using small key-value pairs. After some experimentation, it appears that we can stably insert 20+M rows, using single / multiple threads, when key=4 bytes and value >= 20 bytes. For smaller k/v pair sizes, there are different forms of instabilities seen. We really need a comprehensive test-suite that can exercise these basic insert workloads for diff # of rows inserts, # of clients, and varying combinations of K/V pair sizes.This has been reproduced using standalone test script off of
/main
@ SHA 89f09b3.Branch: agurajada/560-rf-add-bug (Has another commit, which fixes another issue, partially tracked by the failures by repros attempted for issue #458. You need that fix in order to go further along to repro this failure.)
The failure is seen in this test
large_inserts_stress_test --num-inserts 20000000 test_560_seq_htobe32_key_random_6byte_values_inserts
from this branch:With release build, you will get a seg-fault a few lines later.
The test case
test_560_seq_htobe32_key_random_6byte_values_inserts()
has the exact conditions required to trigger this bug. There are other test cases in this test file that invoke different combinations of sequential / random key inserts, and they all seem to succeed.The text was updated successfully, but these errors were encountered: