-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thread local context (#359) causes test issue on Linux: 1: [4] fid:0 channel 2, to submit:64, submitted:Operation not permitted
#375
Comments
Enabling use of libnuma doesn't fix this issue. |
Reverting just the changes to |
Can you log the values of the What's the compiler/platform? In our linux unit test run (Ubuntu 20.04, GCC 9.4.0) we don't encounter this log. |
Not sure if I got the code right (see the async-log-submissions branch
in my fork), but I tried logging aio_reqprio and it was always zero,
so probably my initial diagnosis might have been incorrect.
This is on Debian 12 bookworm and GCC 12.2.0.
I'll try some older versions of Debian today.
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
Same error with Debian 11 bullseye and GCC 10.2.1.
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
After modifying the code to log all io_submit requests, this is what the output looks like:
|
I noticed that when I enable only the
I also noticed that enabling all tests except the
So I think that AlgoTest/SPANNTest is somehow interfering with The successful
|
I noticed that there are a lot of tests that fail, not just one. All of the failing tests use the This is the list of failing tests when AlgoTest/SPANNTest is enabled:
|
I note that when I rebuild SPTAG and its tests with the
|
According to ThreadSanitizer these are the most common locations for data races in SPTAG:
|
Sorry, I am not involved in the project, I am just packaging it for my
employer. So I have no idea about the answers to your questions. You
will need to contact the Microsoft employees who wrote it. Some of
their email addresses are available in the git log. Clone the project
locally and run the git log command and you will see the addresses.
|
okay,thanks. |
Hi, I have encountered the same question, have you found the root cause of this problem? |
I wasn't able to figure out the cause for this bug so far and I don't plan to work on it any further at this point. This workaround enables all the tests to be run successfully, first run the tests excluding |
Really thanks! It works, but it's weird, I would like to follow your testing way to check it again, I suspect that maybe the executed order influence the indices in SSD, like threads competition or other causes. Anyway, thanks your help again! |
If you find any additional details, please send them here. If you find a fix, please submit a pull request with the needed changes and put this as the last line in the commit message, so that your patch will close this issue:
|
Okay, I hope I could~
Okay, I hope I could ~ |
It may be related to the number of CPU cores. My CPU has 8 cores, and when I use 7 threads to access the search interface, there are no issues. However, when I use more than 8 threads, errors start to occur, specifically from channel 8 onwards. For example, if I use 15 threads, channels 8 to 15 will report errors. |
I have reviewed a lot of information, but it seems that there is no apparent relationship between the io_submit call and the number of CPU cores. Therefore, I am still at a loss regarding this bug. |
I tested it again, and it is not related to the number of CPU cores; it is related to the parameter NumberOfThreads. When I set this parameter to 45, there were no issues with the query. Previously, I had this parameter set to 8. |
On Linux, #359 by @PhilipBAdams causes the
SSDServingTest
TestSearchSSDIndexFloatL2BKTDEFAULTTXT
test to continously print this message over and over again:This message comes from the
BatchReadFileAsync
function inAnnService/src/Helper/AsyncFileReader.cpp
and is printed when theio_submit
syscall fails:According to the
io_submit
manual page, this indicates incorrectly setting the RT IO priority class:However, the code does not set the IO priority class anywhere.
My conclusion is that maybe the
iocb
structure being passed toio_submit
gets corrupted somehow, but I am not sure.Thoughts and ideas for further debugging of this are welcome.
I also tried commenting out the contents of the Helper::SetThreadAffinity function but that didn't work.
The text was updated successfully, but these errors were encountered: