-
-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish event doesn't work with no error message #666
Publish event doesn't work with no error message #666
Comments
Could you submit a test code to reproduce the issue? |
Hi, I'm not able to reproduce the problem because I haven't found the reason yet. It happens 3-5 times a week with apparently no reason. I would like to know if anyone has encountered the same issue. |
What persistence/mqemitter are you using? |
Hi Daniel, this is the schema of my mqtt broker made with aedes. |
Ok so you are using the in memory mqemitter/persistence, not redis or mongo right? How does clients connect to broker? Do they have the clean flag set to true or false? What qos? |
Yes, I'm using only the server memory. The pub/sub qos is always 0. I haven't set the clean flag, so I think it is the "true" default value, but I will check every client. |
I suggest to double check the clean flag and be sure it set to |
@Gianluca-Casagrande-Stiga check Linkedin when you have time ;) |
Hi, we still haven't found the reason of this bug. I've only noticed a considerable reduction of this bug when I set requestCert "false" on tls server options, but it is still present. |
@Gianluca-Casagrande-Stiga Did you checked the clean flag on clients connect? Also my suggestion is to check proocess memory and cpu usage to see if there is a memory leak somewhere. |
Yes, I've checked the clean flag is set on each client. |
@getlarge Any clue what could be the root cause of this? I cannot find any error, there are no memory leaks detected, after some time the server just stops responding without any error |
@robertsLando Unfortunately i don't have any idea, TBH i also encountered that same issue (at least symptoms look very similar) few weeks ago and had no time to investigate. |
@getlarge This user isn't using redis/mongo persistence, it's using the built-in one |
yes i noticed, if the error is the same but the persistence is not, then it's tempting to suppose it's not persistence/emitter related. |
Are you using tls too? |
Not from node but there is a proxy that does SSL terrmination. |
I'm thinking it could also be an issue related to mqtt-packet parser |
Related #553 |
I've just found that if ignoring |
@JaosnHsieh Your fix doesn't resolve the main bug, why ignoring $SYS emits works? |
By double checking the code this could only happen when using in memory emitter, this could not happen when using mqemmitter-mongo/redis as them are not using concurrency, what's missing in in memory emitter is a way to release queue when it gets full cc @mcollina |
I have submitted a PR on mqemitter: mcollina/mqemitter#94 |
Hi, in the meantime, I'm deploying my broker using docker image node:14.18.1-buster instead of node:12.16-alpine, to exclude a node version problem correlation. |
@Gianluca-Casagrande-Stiga If possible, just to ensure this is not the problem, try set the concurrency to 0 |
Not sure. I found the Thanks for your work on this mqemitter PR I'm wondering is it correct to call |
I have fixed my PR and added a test for it. BTW I don't think that will fix this issue The problem was that when the queue was full there was nothing that could release it, I have tested my pr with your test code and it's working |
I think this.current++
this._parallel(this, matches, message, callback) Cloud it be something wrong in the return functions in const matches = this._matcher.match(message.topic) code copied from your pr |
@JaosnHsieh parallel alway call released once finished, check line 22, released is a fastparallel option |
Just checked again my edited PR with your code and seems it's not working, I think there is a bug elsewhere so... |
Either it didn't finish or it didn't call. I tested it by adding a count before calling When it's hanging, the
you might see something like this in termial when publishing hanging
|
My thought so Is that somewhere in aedes we are missing a cb or else and this at a certain point stucks everything, unfortunately I cannot find where this happens, @JaosnHsieh you mentioned the |
After some digging by adding some logs this is the state of the queue when it stucks: starting parallel.. current: 1
starting parallel.. queue: Packet {
cmd: 'publish',
brokerId: '5bdea25f-a04d-4d5f-a19d-ddbb1d104410',
brokerCounter: 32,
topic: '$SYS/5bdea25f-a04d-4d5f-a19d-ddbb1d104410/new/clients',
payload: <Buffer 38 33 35 33 39 33>,
qos: 0,
retain: false,
dup: false
}
starting parallel.. callback: [Function: release] I will try to check fastparallel source code to see if I can find out more. BTW seems that by removing the publish on |
Ok that's really strange... seems the problem is Strange thing is that by going in depth in fastparallel code I reached this point: https://github.com/mcollina/fastparallel/blob/master/parallel.js#L111 Where I have added a log like:
That outputs:
After that nothing happens, like if the function is never called (but it actually is as it calls the |
Ok I have found the reason, seems the problem relies on fastparallel. Wondering if this could be the root cause of this issue too. @Gianluca-Casagrande-Stiga please try to use another mqemitter (redis or mongo) |
I confirm that I have the same problem with publish event also using node:14.18.1-buster, so it is not a node version related problem. As you suggest, I've just deployed the broker using mqemitter-redis. I'll let you know if it solves the problem. |
* fix: ensure release is called Fixes: moscajs/aedes#666 * fix: typo * fix: the real fix * fix: add test * chore: add ci and dependabot * docs: ci badge * chore: add missing nodejs versions * fix: use string for versions * fix: remove let/const and other
running an npm update should fix the issue now |
Amazing @robertsLando, you have been able to track down this issue - this is kind of a degree in async debugging! |
It has cost me an headache to find that single line but... It's ok 😆 Grazie Matteo! 🚀 |
Thanks for you fix first !! I'm still curious why it's related to I found this change
I have the same confusion as this issue on fastparallel. I thought It seems like we have to put asynchronous action call when using you can reproduce the fix by
|
It's obivious, using a setImmediate doesn't call done code immediatly but on the next poll of the event loop, this causes the bug on fastparallel to be avoided as the index has the time to be resetted TLDR; Yes that would have fixed the problem on aedes side too but the problem on fastparallel would have been still there (and could happen somewhere else in the future) @jdiamond I suggest to take a look at: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/ |
Hi everyone,
I have a mqtts server running 24/7 with about 20 clients which are publishing 500B payload every second.
At a random time (3-5 times a week) the publish event stops to work: from client side there is no error, you can do the publish command. But from the broker side, no publish event is emitted. On the subscriber client obviously you don't receive any message.
There are no aedes error message: the unique solution I've found is to restart the server.
Anyone has the same problem?
I'm using version 0.46.1
thank you
The text was updated successfully, but these errors were encountered: