-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queue Mode out of memory with sufficient ressources #4867
Comments
I've created this custom n8n image with these changes to log the memory usage before every node is executed. Can you please run your workflows on this image ( |
I tried that image, but it is throwing me the following error. On that setup I have switched versions back and forth before without any problems.
|
can you please share what You might have to change the |
I am using following ENVs on the stack:
Following Command on n8n container: Following Command on n8n worker container |
Can you please change these to: spec:
containers:
- command: ["/docker-entrypoint.sh"] for worker container: spec:
containers:
- command: ["/docker-entrypoint.sh"]
- args: ["n8n", "worker"] Since there are no new migrations in this branch, you don't need the |
done that in my compose file: for n8n
for worker:
also tried this on worker:
And getting this error:
|
ah, you are using docker-compose. sorry I should have asked earlier. I assumed that you were on kubernetes. please try these instead for docker-compose: for n8n container n8n:
image: n8nio/n8n:${N8N_VERSION_TAG}
command: n8n start for worker container: n8n-worker:
image: n8nio/n8n:${N8N_VERSION_TAG}
command: n8n worker --concurrency=20 |
now getting this error:
tried with the logic which worked on my original stack n8n: getting this:
|
sorry that this is turning out to be more cumbersome than anticipated. |
I am mounting Here is the complete docker-compose file, which is working fine with small to medium sized workflows, but as soon as its geting heavy its throwing the metioned errors:
|
in custom images like these we've been using I'm going to create an internal ticket to make such custom images consistent with released images, so the next time you (or someone else) need to run a debug build like this, hopefully it'll be a lot less of a hassle 🤞🏽 |
Hi @netroy is there any solution? |
@prononext Thanks for testing the image. Looks like we might have some issues with temporary objects/strings being created in the I've pushed a new image that does the following:
Can you please pull the latest image, and run your workflow again? |
BTW, the |
@netroy I just ran the updated Image with following results: Hardware usage: The n8n main container error:
The worker started processing the workflow after one worker crashed another container was spinning up untill the second container crashed too and the workflow failed on maxStalledCount. Worker Error 1
Worker Error 2
Hopefully this will get resolved soon. |
@prononext sorry, looks like the logging messed up the JSON objects. If I push another image, would you have some time today to run this again ? |
@netroy yes I can do it again today. Important is that this issue will get resolved. I guess mentioned workflow above will not come to a memory limit of 5MB per node as it is only strings and numbers. Another workflow I have, which was running for max 40 minutes on regular setup is taking now 85 minutes on queue mode but usually finishes. Here some details of the 85 min workflow.
Here some details of the workflow you received the errors from:
Many steps of this workflow need to be in batches of 1 as the enrichment and formatting of product data with many items in the queue is not working, so it has to process one after another. This fact I have tested for months and the only way to solve execution and false transaction errors on regular mode was to let the workflow edit one product after another. |
@prononext I've pushed another image that uses plain test logging. Can you please run this again? |
Hi @netroy same thing happened like before I suppose. Is there anything I can do to assist more? The n8n main container error:
n8n worker error 1:
n8n worker error 2
|
@prononext Thanks a lot for this. |
@prononext would it be possible to share one of the function's code over email? I need to be able to reproduce this memory leak to be able to fix this. |
@netroy can I reach you somewhere? Discord maybe? |
I'm on the n8n discord. same username. |
@netroy any updates on this, can I assist? |
@prononext I investigated further, and turns out it's not a memory leak that is the issue here. We keep all the execution data in-memory while a workflow is still running. This means with workflows as large as the one you have, memory usage will keep increasing until the workflow finishes running. |
@prononext I have made some changes that reduce the memory usage of function and functionItem node by reducing the amount of garbage that is generated. |
@netroy i have just tested the image and it seems that something is very wrong with it. The http Request nodes are not working at all and are returning empty values. Where there are working without the memory-debugging image in regular and queue mode. So with that image testing it is sadly not possible. |
Now it is even crashing on regular mode. Something is very wrong with the garbage cleanup.
Does nobody else have bigger workflows with 30+ minutes runtime, which also are failing with this? |
@prononext I believe the issue here isn't the duration of the runtime or the size of the workflow itself, but the number of function nodes. not many people have 50+ function node executions in a single workflows. |
@netroy I pulled the new image and tested it and it has the same error that http request nodes are not executed correctly and are returning empty results. So no real load testing is possible. The workflow executes successful, but thats only because it is not handling any data :-/ |
@prononext I'm running this simple workflow without any issues. Can you reproduce this issue outside your workflow? {
"name": "My workflow 2",
"nodes": [
{
"parameters": {},
"id": "78d32025-0d8a-4334-9e0a-7a462b94bad6",
"name": "When clicking \"Execute Workflow\"",
"type": "n8n-nodes-base.manualTrigger",
"typeVersion": 1,
"position": [
820,
460
]
},
{
"parameters": {
"jsCode": "// Loop over input items and add a new field\n// called 'myNewField' to the JSON of each one\nfor (const item of $input.all()) {\n item.json.myNewField = 1;\n}\n\nreturn $input.all();"
},
"id": "215937c0-4729-4560-a4b1-0e1cd676dea5",
"name": "Code",
"type": "n8n-nodes-base.code",
"typeVersion": 1,
"position": [
1040,
460
]
}
],
"pinData": {
"When clicking \"Execute Workflow\"": [
{
"json": {
"name": "First item",
"code": 1
}
},
{
"json": {
"name": "Second item",
"code": 2
}
}
]
},
"connections": {
"When clicking \"Execute Workflow\"": {
"main": [
[
{
"node": "Code",
"type": "main",
"index": 0
}
]
]
}
},
"active": false,
"settings": {},
"versionId": "49e9c340-23a3-49c6-99f3-6733438c59df",
"id": 2,
"meta": {
"instanceId": "78d750806a8099a2fa93299443c6d93cef36ac3e4ffc20647bf626f6dbe8df1b"
},
"tags": []
} |
@netroy yes that workflow is working, but its just a function node handling 2 items. I was mentioning that your memory-debug image is affecting the https request nodes and they are not giving out any data. I just did the following tests to debug further with this results:
My conclusion is that you have changed something important in the last 3 month that is affecting the node and container maximum memory limit that a container is crashing if it consumes about 2GB per container, even there is 16GB Ram available and not even half of that instance ram is used by the whole setup running. I hope this helps and we get a fast resolution of this problem. Maybe some PostgreSQL logs help also on this part
|
I have updated and cleaned my docker-compose file and used all the latest versions for testing, still the same problem.
Docker Compose File
Environment Variables
p.s. I hope to find this kind of clean commented compose files with envs, clean and in the same order, on n8n github soon :-) |
Thanks for doing this. Can you please send these changes as a pull request 🙏🏽 ? |
Hi @netroy I just reduced my HTTP-Requests responses by 99% in size, by editing the configuration of the endpoints which get called outside of n8n. By this I got about 3% further into finalizing the workflow but the same errors still persists, and 97% of executions are missing a successful workflow. So there definately is a big problem with the garbage collector on the HTTP Request nodes. I suppose that n8n should be able to handle under 2GB of request response data of HTTP-Request nodes in a single workflow without getting a error. In my workflow there are about 1500 executions of HTTP Request nodes and by my calculations these executions result together in max 300MB of response data only with single string items. My ressources are not being fully used and the workflow is crashing. |
@prononext I've pushed another update to the same image tag. This includes the fix for large expressions that was breaking the HTTP Request nodes in your workflow. Regarding the 2GB memory limit: that actually comes from node.js. Setting the container memory limit or reservation to above 2GB isn't going to be meaningful without this env variable. |
@prononext have you had a look at the updated image? |
@netroy Sorry I had no time to test it really. |
I sadly could not test this further, as many production environments were under pressure and workflows had to be completely restructured to avoid the errors. But now the errors returned like this.
I even went to Hoping to solve it I replaced all the old HTTP Request nodes, without any success. Even on really simple tasks like 3x calling a wordpress api to get 300 posts is failing. Versions Tested: 0.212.0 + 0.211.2 + 0.211.1 |
@prononext We've significantly reduced the amount of memory used in |
Hey @prononext, Did you have a chance to give this a test? |
I guess the new code and http request nodes did the job. |
the error is back on Version 1.11.1, even the workflows which I made much smaller are getting errors now:
|
As a note, thank you @prononext for the documentation. I wasn't aware of the |
@netroy @mickaelandrieu What I noticed further:
Lets hope to get out of this experimental phase again very soon 💯 👍 |
More and more even simple workflows are affected with the maxStalledCount error.
It seemes somehow the whole execution logic has changed |
Hey @prononext Some of these may be resolved in 1.15, there are also some settings that can be tweaked to help with the maxstalled error. Are you able to share how many workers you are using, how many workflows you have running and what the intervals are for them running so we can get an idea of the load? What version of n8n were you running before 1.11.2 and if you drop back down to that version (assuming it wasn't pre 1.0) do you still run into the same issues? |
Hi @Joffcom what are more settings that can be tweaked? I am using 4 workers now and load the system load is about 50% of CPU + Ram I configured wait nodes like 5 seconds and the sub workflows take about 1 minute to complete. As of now the workflow takes over 300 minutes when before it finished in 50 minutes, with about 450 sub workflow executions. Before I was using 1.9.x (1.9.3 as I remember) which worked fine. |
Hey @prononext, We added 4 new options which can be found at the bottom of the queue environment variable list here: https://docs.n8n.io/hosting/environment-variables/environment-variables/#queues If you downgrade back to 1.9.3 does everything work as expected? I am wondering if maybe some changes in 1.11 are causing some issues but we do have some fixes and improvements in 1.15 and more to come. What are your workflows actually doing? Are they only making api calls, reading / writing files, using message queues / databases, the more information we have the more we can dig into this. |
thanks @Joffcom I will try to tweak with some of these variables. The workflows are
so that sub workflow can be executed a couple of houndret times depending on the data to update. |
Tested following parameters in quite different settings:
same behavior with maxStalledCount and randomly 1-2 minutes more or less workflow running length. any ideas? |
Hi @Joffcom have you found anything that would cause the stalling? On downgrading I often had quite bad experiences so I am not feeling save on that part. It would be really nice to be on production level again soon 💯 |
On v1.15.2 the Stalled-Error seems to be gone 👍 |
@prononext I think the issue with saving settings not working as expected were fixed in #7634 and released in 1.16.0 yesterday. |
Hi, when running my workflow which was smoothly running without queue mode on n8n v0.19xxx I now experience the following error and the workflow is interrupted.
The workflow is quite big and I already split it into several smaller parts as it was possible. Normally the workflow runs for 25 minutes on the old setup without any problems. Now the system is not even starting to use ressources as shown in this image:
*Error n8n main container
Error: job stalled more than maxStalledCount
Error on worker:
On other containers, postgres, redis, traefik there are no errors
MY SETUP
I am on a n8n docker queue mode setup with 3 workers and concurrency 20. I have tested 1,2,3 workers with concurrencies 10,20,30,50,100,200 all with the same results. I have also tried to set memory limit on docker services also with the same result:
Additionsl Environment Variables:
The text was updated successfully, but these errors were encountered: