-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mongoose findOneAndUpdate hangs under heavy load #14877
Comments
Output of script:
If mongoose is bypassed and the other code that is commented out is swapped in, the issue doesn't seem to occur. |
It seems that if you make the schema bigger/more bloated with additional fields, even if they are not populated, the issue occurs earlier after about 400 iterations. But this could be a fluke. |
I have been fighting this issue have have not been able to go above version 8.2.2 What I have found in production is that the database connection is checked out from the pool and remains checked out. I assume what I experience must be mongodb, since everything from where the connection is checked out to where it is checked back in, is down in the mongodb driver. |
Here is a function I'm using to look at the mongodb internals and see if the connections have been checked out from the connection pool. I was suspecting that it was a connection pool issue, but that proved to be wrong. Could you check if you are leaking connections from connection pool in your tests?
|
Thank you for this! I've tried running my repro on several mongoose versions, and here are my findings: 8.3.0 repros the issue The issue starts in 8.3.0 /cc @vkarpov15 |
here's what I found: If we are experiencing the same thing setting socketTimeoutMS will cause it to eventually throw MongoNetworkTimeoutError |
Update: I used yarn resolutions to pin Indeed this may be an upstream issue in the mongodb nodejs driver. |
It appears that the problem started with |
This may be the same issue as: https://jira.mongodb.org/projects/NODE/issues/NODE-6370 |
I can confirm that we resolved our issue with the workaround of pinning This is not a mongoose issue. mongoose I'll keep this issue open since the mongodb node driver doesn't have an issue tracker, and perhaps someone from the mongodb team sees it here. A temporary fix for mongoose might be to revert the mongodb driver bump back to 6.3.0 |
i think worth keeping open because Mongoose should at least have some option for using the working 6.3.0 |
Seems it might be a good idea to have mongoose define a peer dependency to mongodb instead of a direct dependency. In this case the version of the driver would be user controllable. Not sure why it's not already that way. |
I'm unable to repro this on my end with Mongoose 8.6.2, although I had to set Mongoose has always pinned exact versions of the MongoDB Node driver because in the past Mongoose was more tightly coupled with the MongoDB Node driver and patch releases would cause breaking issues. However, that hasn't been the case in some time, so we've considered relaxing Mongoose's Node driver version requirements. @andreialecu if we changed Mongoose's dependencies to |
That would probably work, but I think making mongodb a peer dependency might still be a good idea. Or a peer dependency with a default. See: yarnpkg/berry#3710 (comment) |
You can probably get it to repro if you use upsert and update the same document repeatedly. I haven't tried optimizing the repro. I noticed it happens easier with bigger documents. The linked issues in the mongodb issue tracker also mention this. It may still happen with smaller documents, just at a lower frequency. We've had some weird issues for a while which I think we can attribute to this problem. |
I have a test case that seems pretty reliable, at least with atlas. It might hit storage limits, but it is a simple I'm not certain if it requires the schema or not, but it's about 600 fairly large objects, and from what I can tell it would be kicking off 600 |
npm doesn't install peer dependencies by default AFAIK, which makes making mongodb a peer dep likely a non-starter. @hunkydoryrepair can you please send me the script? You can just email me val at karpov dot io, if that's preferred to posting the script on GitHub. |
It does since 2020: https://github.blog/news-insights/product-news/presenting-v7-0-0-of-the-npm-cli/ This has been actually quite controversial. The Yarn maintainers were vehemently opposed to this default. Yarn and I think pnpm will not install peer dependencies automatically, but they will install peer dependencies with defaults (see my previous comment). |
the case I thought was the same is a different bug, I think. I have a solid repro case but it works in 8.4.0 reliably, and consistently hangs in 8.4.1 or higher. I will open a separate bug |
@andreialecu |
The test case I shared in my initial post was 100% reproducible to me. You weren't able to reproduce it? It's possible it could be related to latency to Atlas as I was running it locally and my ping to the cluster is 30-60ms. |
I wasn't able to consistently reproduce with it, even when connecting to Atlas. It timed out a few times for me, but it was pretty slow other times, so it wasn't super convincing. And i go through a VPN to connect to Atlas so it is pretty slow, but I'm not sure it was reaching the idle timeouts. And the complicated schema ones seem to be a different bug from what I can tell, as it does some N! operations where N=document depth, so even getting 10 levels deep would seem to hang. You also said you could not produce locally, and I think this is why. |
I was only able to reproduce it with certain mongodb driver versions, and it was consistent. In our production app, pinning to the driver version I mentioned resolved the issue. |
@andreialecu Thank you. You inspired me to dig into this again using your use case. The issue I ran into is that your repro stops producing the problem if it is run multiple times, but I added a Test.deleteMany({}) at the beginning, and it was suddenly possible. I spent the day tracking it down in the debugger and found a solution, which I added to this issue. Mongodb team didn't put the listeners on the 'data' event of the socket in time to catch all of the 'data' events, so it misses the returned value. This happens only with larger commands which exceed the kernel socket buffer, which causes the socket.write command to return a value indicating it needs to be drained, and the mongodb driver was awaiting that event before listening for the 'data' event, and thus, before the task awaiting the 'drain' is processed, the 'data' event comes back and is emitted. Anyway....here's hoping they accept my solution! |
Can we keep this open until a fix from mongodb is available and incorporated? |
mongodb driver fix has been pulled into the 6.10 branch. Next release should have a fix |
Prerequisites
Mongoose version
8.5.1
Node.js version
20.17.0
MongoDB server version
7.0.12
Typescript version (if applicable)
No response
Description
We have a batch job that does a bunch of updates that used to work a while ago, but recently started hanging on calls to
findOneAndUpdate
. The promise stops resolving after a while.I have been able to create a repro for this, but it seems to only viably reproduce on MongoDB Atlas (or probably other remote servers). On localhost it doesn't seem to reproduce.
Also if I bypass mongoose entirely and use the mongodb driver directly, it seems to work.
Steps to Reproduce
Expected Behavior
This must've started happening after some recent update to either mongoose, mongodb, or a mongo version. Unclear which, we haven't ran this job in a while and just noticed it occuring now.
The text was updated successfully, but these errors were encountered: