-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAM leak to the MeshCentral server #6179
Comments
something happened between 11 and 12 from the looks of the graph going to sound like a DAFT one, can you disable/remove the autoBackup and restart and monitor? and the fact it looks like its loading itself over and over again in the pic doesnt look good either? |
I disabled autobackup, rebooted the server, and watched the server work. |
Sometimes this error appears in the logs: -------- 6/16/2024, 9:33:59 PM ---- 1.1.24 -------- (node:55552) Warning: An error event has already been emitted on the socket. Please use the destroy method on the socket while handling a 'clientError' event. but I don't think it's related to the problem |
@sheshko-as that issue has been around for about a year, |
It didn't help |
Server Error Log: <--- Last few GCs ---> [89911:0x5ff48b0] 17700692 ms: Mark-sweep 4047.0 (4138.1) -> 4034.2 (4141.1) MB, 2816.1 / 0.0 ms (average mu = 0.346, current mu = 0.030) allocation failure; scavenge might not succeed <--- JS stacktrace ---> FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 -------- 1: 0xb9c310 node::Abort() [/usr/bin/node] -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 -------- 2: 0xaa27ee [/usr/bin/node] -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 -------- 3: 0xd73eb0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/bin/node] -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 -------- 4: 0xd74257 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/bin/node] -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 -------- 5: 0xf515d5 [/usr/bin/node] -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 -------- 6: 0xf63aad v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/bin/node] -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 -------- 7: 0xf3e19e v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/bin/node] -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 -------- 8: 0xf3f567 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/bin/node] -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 -------- 9: 0xf2076a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/usr/bin/node] -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 -------- 10: 0x12e599f v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/usr/bin/node] -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 -------- 11: 0x17125f9 [/usr/bin/node] |
It also didn't help "Useful config.js settings": https://ylianst.github.io/MeshCentral/meshcentral/debugging/ |
Are you using some kind of VPN/proxy between agent and server? Can you monitor the ws connections between agent and server? By default they will stay up once established for 24hrs...but I've seen VPN and other networking software/proxies prematurely close ws connections. When mesh realizes its connection is dead it stands up a new connection but I think the old isn't cleaned....memory leak. |
@silversword411 im not sure if thats the case? |
we need to find out what you did/happens when the memory starts to climb |
We don't use it, agents connect directly by domain name |
I'm trying to monitor all this, but I don't see any pattern yet. The database is the only thing that I have not changed yet, I will try to change it to postgresql, for example. There are specifics: we have a lot of client computers with a frozen C disk through the Shadow Defender program, as well as a lot of computers that work without disks through one vhd image (for example, one vhd disk can be on 30-40 PCs at once) for these computers, it is worth deleting in the group settings when the computer is offline. |
@sheshko-as wow that does sound mad/complex! |
All groups that use one VHD image per group are configured to be deleted after the computer becomes offline. |
@sheshko-as hmmm and they vanish ok? |
yes |
Okay, I'll do everything, I'll write you back based on the result |
@sheshko-as no worries! |
updated to 20.15, did not help, the problem was repeated today |
@sheshko-as which issue sorry? u mean the memory increase/crash? |
I'm trying to figure out what action causes uncontrolled RAM growth to begin, but it's not working yet. |
yes |
OK that's an interesting theory! So its opening multiple meshcentralrouter sessions that seems to increase the ram on the server side! I'll have to test myself, sadly I don't have 32 comps that have rdp. But I suppose I could open 32 remote desktops and see if the memory starts increasing! |
do you think a low-quality VPS can cause this kind of RAM leak behavior? |
@sheshko-as sorry ive not had chance to test yet, been working on the android app, and also poorly (full of cold/flu again!) |
I changed the VPS provider again, to the best one in our country, the problem does not manifest itself for a day. I'll run the tests for a couple more days and let you know the result. |
I think the problem is related to this error: #6127 It was also noticed: on the computers of the meshcentral admins, for some reason, several instances of MeshCentralRouter open when working in the browser and clicking on the RDP connection button. If the administrator notices that several windows are open and closes all but one, there will be a sharp decrease in RAM consumption on the server. |
One thing to try is using the master branch, which includes a fix for the other issue and see if the memory leak happens still?
|
ok well thats good to know! |
@sheshko-as i cant seem to replicate your findings? it doesnt seem to be loading up multiple meshcentralrouter.exe instances at all? but just a RANDOM idea, can u try my version of meshcentralrouter? MeshCentralRouter.zip |
Previously, this happens when MeshCentralRouter is inactive for a long time, for example: the first connection via RDP via MeshCentralRouter is active, and the second one is made after one hour, for example, in this case, a second copy of MeshCentralRouter may open. But this is not yet accurate, we are testing it. |
Okay, let's check it out, and I'll write to you based on the test result. |
It is very difficult to understand the reason why this is happening. But in the new version that you sent me, the same problem arises. |
long time no speak on this issue |
It is very difficult to find the reason, but we are trying! Of course, we will check your version of MeshCentralRouter, I will write to you based on the result. |
The new version of MeshCetralRouter and the new version of MeshCentral did not help, but the problem with opening multiple copies of MeshCetralRouter seems to have been solved After upgrading to version 1.1.32, memory leaks began to appear more frequently. There is nothing in the MeshCentral error log, it is completely empty. |
@sheshko-as I through, i fixed that bug in ur screenshot? |
We have updated MeshCentral to version 1.0.32 |
@sheshko-as huh weird? So ur still experiencing the bug where it's opening multiple rdp ports! Will have to look into it! When u click the rdp, it should open ur meshcentralrouter and then check if the device already has an rdp port open, and if so, use that port not create another port!? |
thanks @sheshko-as i can replicate BOTH of your issues you are having! i will have a look tomorrow when i get chance! this could possibly be the reason WHY you are seeing the RAM leak if its opening port after port after port from the web ui as this doesnt happen if you use the meshcentralrouter to do the rdp (right click device and click RDP) also its weird the fact the group name is being shown incorrectly too !? P.S: can you open this exact issue in the meshcentralrouter repo and link this issue in? |
@sheshko-as i do believe this commit now fixes the multiple port issue! Ylianst/MeshCentralRouter@e98f76a basically it was adding the ports over and over again regardless if they already existed! please can you try this build and let me know! PS: im aware the group names are still not showing correctly, going to look at that tomorrow when im not falling asleep |
oh wow! ok! erm? weird will have alook into that |
ive just had another look, im not too sure what the issue now is? when u connect to device 1, it opens meshcentralrouter and then opens the RDP app as expected you then do your sessions/control/etc then close out of the RDP apps, and it keeps the tunnels open as expected, only 1 thing, IF YOU CLOSE THE RDP APP, MESHCENTRALROUETR WILL CLOSE TOO but ONLY if you connect to a single RDP device, if you then open another RDP device, the |
I can't replicate that bug at all? Will have another look tomorrow and try again. Are u actually doing any rdp sessions/connected, or are u just leaving the rdp Windows open but not connected |
This problem appeared on the new version that you dropped: 1.8.9046 |
I leave PC02 RDP open, and close PC01 completely and reconnect via RDP |
Describe the bug
During operation, there is a sharp consumption of RAM until it runs out. I increased the amount of memory: it was 4 , then 8, now 16. Increasing the amount of memory does not help. I checked on a dedicated server: when the memory runs out, the service is running, but the memory is at the limit. I checked on the VPS server, the service restarts when memory runs out. The problem may occur once a day, perhaps once every three days, but no patterns have been found.
Information from journalctl:
Jun 14 18:56:41 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/meshcentral.service,task=node,pid=1>
Jun 14 18:56:41 kernel: Out of memory: Killed process 15892 (node) total-vm:28494740kB, anon-rss:15090984kB, file-rss:1304kB, shmem-rss:0kB, UID:0 pgtables:49396kB oom_sco>
Jun 14 18:56:41 systemd[1]: meshcentral.service: A process of this unit has been killed by the OOM killer.
Jun 14 18:56:42 systemd[1]: meshcentral.service: Failed with result 'oom-kill'.
Jun 14 18:56:42 systemd[1]: meshcentral.service: Consumed 4h 57min 33.871s CPU time.
Jun 14 18:56:52 systemd[1]: meshcentral.service: Scheduled restart job, restart counter is at 1.
Jun 14 18:56:52 systemd[1]: Stopped MeshCentral Server.
Jun 14 18:56:52 systemd[1]: meshcentral.service: Consumed 4h 57min 33.871s CPU time.
Jun 14 18:56:52 systemd[1]: Started MeshCentral Server.
Server Software (please complete the following information):
"version": "7.0.11",
"gitVersion": "f451220f0df2b9dfe073f1521837f8ec5c208a8c",
"openSSLVersion": "OpenSSL 3.0.2 15 Mar 2022",
Client Device (please complete the following information):
Remote Device (please complete the following information):
Your config.json file
The text was updated successfully, but these errors were encountered: