Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network_run hogging lot of memory on call to VoxelManip:read_from_map #327

Open
oz-tal opened this issue Sep 25, 2023 · 2 comments
Open
Labels
Bug Something isn't working

Comments

@oz-tal
Copy link

oz-tal commented Sep 25, 2023

The following line quickly trigger huge memory consumption on larger networks:
https://github.com/mt-mods/technic/blob/master/technic/machines/network.lua#L596C7-L596C7

vm:read_from_map(pos, pos)

In my setup, the calls stack up to 10-15GiB of allocated memory before the server shutdown get triggered by the system oom killer.

Server details:

  • Minetest 5.7.0 (Linux)
  • LuaJIT 2.1.0-beta3
  • postgres 15.3 (persisting to an HDD)

Technic network details:

  • Nodes: 8602
  • Cables: 8473
  • Generators: 40
  • Consumers: 87
  • Batteries: 2

The consumers triggering the call to VoxelManip:read_from_map are all quarries in far away chunks. In the offending network, quarries account for ~60 of the consumers. The crash seem to occur when most of them require the call to run.

Commenting out the code for the vm fail to get some of the nodes in some network runs, but fix the performance issues. alternatively, keeping the code intact and reducing the size of the network prevent oom crashes, but still generate intermittent lag spikes on the server.

@S-S-X
Copy link
Member

S-S-X commented Sep 25, 2023

Yeah it does, only thing it currently ignores is cables. Mapblocks for everything else has to be loaded although I've once tested possible caching a bit. Some incomplete info here:

However never pushed that forward because as currently pipeworks would anyway need areas to be loaded so in actual game it would be mostly useful for generators.

There's also this optimization idea for quarries to release mapblocks they're actually working on, implementing this could help a bit but wouldn't be silver bullet. It would only help in very specific scenarios involving a lot of quarries:

But cable count shouldn't really matter, but network cache builder still has to load all mapblocks to find cables while initializing network for first time, after network changes or when admin resets cached data.

If high memory usage is because of mapblocks for cables then I'd recommend checking engine cache configuration / memory tuning settings. It should be possible to release these mapblocks immediately after first read, how many at time depends on how much server is able to read in within network cache builder time limits.

edit. I think worst case for that cable count is around 2200 mapblocks (assuming cable runs straight at the edge of mapblock).

@BuckarooBanzay BuckarooBanzay added the Bug Something isn't working label Sep 25, 2023
@oz-tal
Copy link
Author

oz-tal commented Sep 25, 2023

You're right about the cables count, it doesn't seem to have any impact and the networking building process is very smooth as far I can see. Taking this into account, my network is actually not that big.

I'm pretty much out of my depth here, I still know too few about Minetest. But to me it seem like VoxelManip is either misused by the mod, or Minetest itself has some issues around the API. It just doesn't make sense to eat up 10GiB+ of memory in a short time to load the mapblocks associated with the ~60 quarries. I wish I had more ram at hand to see how high it can go, but regardless, the whole server grind to an halt while processing the requests, before the oom killer pop a cap in it.

I've also tested forcing the call to read_from_map on every iteration, the allocated memory variations are pretty wild. Right now I'm just logging the position of the node instead of executing the call and I can see them getting skipped on some run, but not all (which is good enough for my current usage).

The linked issues where definitely interesting read, but even with a good caching layer and quarry optimisation, I'm not sure this issue will completely disappear? Something else seem to be going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants