Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splinterdb_insert() hang under concurrent insertion #620

Closed
chrisxu333 opened this issue Apr 5, 2024 · 8 comments
Closed

splinterdb_insert() hang under concurrent insertion #620

chrisxu333 opened this issue Apr 5, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@chrisxu333
Copy link

chrisxu333 commented Apr 5, 2024

When I perform concurrent insertion by calling splinterdb_insert(), each time I increase the thread count to be larger than 8, the splinterdb_insert() call seems to hang forever. I suspect that it may have something to do with deadlocks.

Config setup:

.cache_size = 2 * Giga,
.disk_size = 64 * Giga,
.data_cfg = &data_cfg,
.use_shmem = FALSE,
.io_flags = O_RDWR | O_CREAT | O_DIRECT,

Data config setup follows the default by calling default_data_config_init with key size of 8.

.max_key_size = 8,
.key_compare = key_compare,
.key_hash = platform_hash32,
.merge_tuples = NULL,
.merge_tuples_final = NULL,
.key_to_string = key_to_string,
.message_to_string = message_to_string,

Note that when I turned off O_DIRECT, everything works fine and it won't hang anymore.

@chrisxu333
Copy link
Author

chrisxu333 commented Apr 5, 2024

I just tried to reproduce this bug with large_inserts_stress_test driver. After I add the O_DIRECT flag to the splinterdb_config in large_inserts_stress_test.c, the test will hang on large_inserts_stress:test_seq_keys_random_values_threaded most of the time, and a few times on other multi-threaded testcases as well. Any idea what might cause this?

@gapisback
Copy link
Collaborator

HI, @chrisxu333 --

When you say : "When I perform concurrent insertion by calling splinterdb_insert(), each time I increase the thread count to be larger than 8, the splinterdb_insert() call seems to hang forever. " ...

Do you have a stand-alone repro that you wrote on your own? Or, were you relying on reproducing this issue using large_inserts_stress_test.c?

Re: "After I add the O_DIRECT flag to the splinterdb_config in large_inserts_stress_test.c, the test will hang on..."

I suggest you do not try to use this stress-test and its sub-cases to reproduce the bug you are finding.

That stress-test is somewhat in a flux. Many test cases do work reliably but some of the cases in it are currently a bit incomplete and can lead to hang / unpredictable behaviour.

I have another revision of this large test-suite that is undergoing review, so until the time that open PR is addressed and integrated, I suggest you please not rely on this test-suite as an exerciser to reproduce your problem.

@chrisxu333
Copy link
Author

chrisxu333 commented Apr 6, 2024

Hi @gapisback ,
To answer your first question, yes I'm running SplinterDB on my own benchmark driver that I wrote myself. The reason I tried to reproduce on that stress test is to avoid any potential mistakes that I might made in my driver, so that I could narrow down the actual cause of this bug to some extend.

So to rephrase the bug, when I run SplinterDB insertion under high concurrency (16 threads for instance), and I used O_DIRECT when I call splinterdb_create, the program will hang forever after some time.

@rtjohnso
Copy link
Contributor

rtjohnso commented Apr 7, 2024

I can repro with large_inserts_stress_test per your instructions.

It looks like some io completions are not doing what they are supposed to. One deadlock had all threads complete except for one, which was waiting on the CC_WRITEBACK flag to be cleared on a page. Another had all threads complete except one, which was waiting on a req->busy flag to be cleared.

Will investigate. As @gapisback mentioned, one outcome of the investigation may be that the test is buggy. In that case it will be helpful to see the code you wrote. But let me try debugging it with large_inserts_stress_test first.

Thanks for the report.

@chrisxu333
Copy link
Author

Hi @rtjohnso thanks for the help. Let me know if you need my code :)

@rtjohnso
Copy link
Contributor

@chrisxu333 -- can you check whether PR #621 fixes your issue?

@rtjohnso rtjohnso added the bug Something isn't working label Apr 13, 2024
@chrisxu333
Copy link
Author

@rtjohnso Yes I just ran it with my code and it works perfectly :) Thanks for your help

@rtjohnso
Copy link
Contributor

Fixed by #621 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants