-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NNG]: reap woker reaps nni_pipe
twice.
#1411
Comments
Thanks for spotting the issue of NAnoMQ. But I am unable to compile & run your repo. And I get or error: failed to parse manifest at Caused by: consider adding while trying to run your test. |
Thanks for the detailed error logs. |
Forget about my Rust env issue. After careful examination, The issue you report is really interesting, I found several other issues too. For anyone whom may be concerned: As the log shows,
@Nereuxofficial May I ask about your fuzzing methodology? Are you only playing with MQTT? or also TCP part? Gonna discuss with @gdamore about this. Anyway, I will tag it as help wanted for now. This is not an easy issue to address. Need to work with upstream |
nni_pipe
before it was created
ASAN also tells same story
|
nni_pipe
before it was creatednni_pipe
twice.
Signed-off-by: JaylinYu <letrangerjaylin@gmail.com>
That sounds really interesting. So the nni_pipe is reaped before it would be created and then trying to be destroyed? Sounds like memory corruption weirdness without memory corruption... Maybe something reads out of bounds in a union/struct and ASAN cannot detect it, which causes this weird behaviour? Of course. I am only fuzzing the MQTT part(which is fuzzed via a Markov Model), though i cannot guarantee all connections are closed properly. It is possible to fuzz any TCP port, as long as if a new connection is established bytes can be sent, which is a small check implemented to prevent fuzzing invalid targets.(If anything turns out to be a problem though, I'll be happy to provide fixes). Also I'd be really interested in the fixes, as I'd like to include them in the report about my project :) |
This is not final fix. Signed-off-by: JaylinYu <letrangerjaylin@gmail.com>
This is not final fix. Signed-off-by: JaylinYu <letrangerjaylin@gmail.com>
I tested this again and the prevent collateral damage fix did not prevent the issue from occuring on my system. Are there any updates on this? |
Unfortunately not. That PR is a WIP And Only fix side effect. |
I will try to troubleshoot the issue next week and maybe add additional information. Unfortunately this is not really my expertise |
Thanks, it would be great to verify whether the issue is only with MQTT or a pre-existing bug in nng reap. at least set the right direction |
Just wanted to mention that I have also seen this issue now that we upgraded to .19.5 , it is intermittent though and our process scheduler was able to restart nanomq successfully. |
interesting, do you mean this doesn't occurs before 0.19.5?? |
Previous version we were using was: Also confirming we did not see this issue on the older tag/sha. |
News update: |
Do I have to update the nanomq sha or can I just update the NNG repo sha to get this fix pulled in? Or even better if we could hotfix this into the release branch so we don't have to fork nanomq again. |
I recommend to wait for the next release, which is coming in this month. We only do hotfixes for the LTS version. Currently, no LTS is activated (0.6.6 is outdated). But this is not hard to backport the fix on your own. You only need to update the NanoNNG repo, or use the most updated main branch of nanomq and nanonng. |
Sounds good will wait for the next release for now. |
@Nereuxofficial Looking forward to your feedback and verification also. |
Sorry, i tried it yesterday but i had to fix some stuff first in order to get the fuzzer to work again. Fixed here too! Good work on the fix! |
Do you have a specific date for the release cut that will contain this fix @JaylinYu ? |
Bug-fix release will be coming out at the end of Oct |
Will it be out on 10/30? |
Definitely |
Describe the bug
While fuzzing the broker with my self-written fuzzer it quickly crashes with this end output:
Expected behavior
The broker should not crash upon receiving any message.
Actual Behavior
The Broker crashes. Here is the last output to console after crashing:
To Reproduce
This is where it gets a bit funky. Since it is a locking bug it is not guaranteed to be reproducible on every run of the program. But when fuzzing i've always gotten it so far and with replaying it happens 6 times out of 10.
Here are two logs reproducing the behaviour(the second one with log_level debug):
https://gist.github.com/Nereuxofficial/2aef73e8403a6445cca91e6c34d3b296
And here is another log with fuzz mode from my server
Should be noted that last stdout and last stderr is the complete output of the broker. I've just forgotten to update the logging.
I would like to do a small C program here but due to this requiring to rewrite the fuzzer in C this is not an option. The easiest way to reproduce this is with the fuzzer:
After making sure your system nanomq is the newest version do this:
git clone https://github.com/MCloudTT/rusty-FUME cd rusty-FUME git checkout repro_nanomq_mutex_lock
cargo run -r -- --broker-command "nanomq start" replay
(Here you can also attach a log_level, though in my experience the crash rate was lower this way...)If 3 doesn't work like on my server try to find the bug again using fuzzing:
3. Run:
cargo run -r -- --broker-command "nanomq start" fuzz
** Environment Details **
Additional context
If you run into problems with reproducing or using the fuzzer feel free to ask me for help and also if you have suggestions for improvements to the fuzzer. Also I'd be really interested in the fix and what caused it so it would be nice to know what caused this under what circumstances.(Also if you happen to stumble on other bugs with this)
The text was updated successfully, but these errors were encountered: