-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM error with master/slaves setup (zeromq, windows) #1372
Comments
Interesting. You're not doing anything wrong AFAICT. I suspect this is a windows related issue. Is it possible for you to test if you can reproduce this on a linux or mac machine? |
Sure thing ! I'll try on WSL, if that works for you. Worst case, I can set up a Linux VM. |
Ok, I tried several times on Ubuntu 18.04 LTS on WSL, running it the exact same way with Powershell Core. I couldn't repro the issue. |
Ok, good to know! It might be awhile before I get the chance to try to reproduce this on a Windows machine. Please keep us updated if you test anything else (e.g. another version of ZeroMQ or Python). |
Will do ! |
Tested this on my windows machine and I can reproduce this (on v1.0.0 of locust and python 3.8). Used a different locust file to the one attached above. However, I couldn't really figure out a pattern to the failures but there were a few observations:
Machine Specs: |
Maybe we should bump the minimum required pyzmq version? Other than that I dont think we can do much without a clear repro case. @anuj-ssharma @mparent can you check your pyzmq versions? |
@cyberw 19.0.1 for me. |
Ok, that is the latest, so that shouldnt be an issue... |
I dont have any real ideas on how to solve this, and I hardly use Windows at all these days. If any of you have the time to do some more digging & finding a fix it would be much appreciated (unfortunately it is unlikely anyone else will fix it for you :-/ ) |
It's fine, our actual locusts are run on Linux anyway. I'll simply keep working on WSL locally and try to find a fix if I have some time. |
@cyberw I also faced this issue when run on Window |
@bebeo92 Have you tried updating to latest pyzmq? Can you find any pattern to when it works and when it doesnt? With no more details there is nothing we can do, sorry... (and with so few of our users running on windows I dont think this issue will get a lot of attention) perhaps file an issue with pyzmq? |
@cyberw I think it happens when I click on Stop button. Does it expect behaviour? |
It should work. Sorry, I dont think I can help you... |
@cyberw I think it still a valid bug, can you contact someone else to verify it? |
I agree, but there is really nobody to contact. this is a project maintained by volunteers. |
like I said, you may have more luck talking to the maintainers of pyzmq itself. |
@cyberw this is happening on the project I am working on. When running locust on a windows machine in headless mode with several workers(all on the same machine) there is a high chance the master will assert. The chance increases the more workers that are spawned. Note: Assert only starts triggering with 3 or more workers. The assert always triggers after the master has sent a message to all workers. Either at the start when sending the spawn message or at the end telling them to quit. The assert: warning : FATAL ERROR: OUT OF MEMORY (C:\projects\libzmq\src\decoder_allocators.cpp:85) https://pyzmq.readthedocs.io/en/latest/morethanbindings.html#thread-safety The pyzmq docs mention c-level crashes could be encountered if calling into the same sockets from multiple threads. When looking at the locust setup it appears to be using greenlets. The same socket could be called into multiple times but would be on the same thread. I am not experienced with python(only started using it to get locust setup) so I am unsure if this could be causing the issue? versions: Do you have any advice on tracking down what could be triggering this issue? |
Hi! Sorry, I have nothing to add here. You probably already know a lot more than me :) |
I also have this issue. FATAL ERROR: OUT OF MEMORY (C:\projects\libzmq\src\decoder_allocators.cpp:85) I get it a majority if the time when just creating my master and worker nodes. I'd say 3 out of 4 attempts fail. If that part passes, sometimes it fails with the same error after I start my load test. versions: I will check out pyzmq, but I wanted to post here for the sake of visibility (i.e., it isn't just a few people getting this error, when I talked with the guy who recommended Locust, he said "Oh yeah it does that all the time. I just keep trying until it works." Personally I'd rather fix it. So I'll see if the folks at pyzmq have this on their radar already. Thanks. |
Is there a ticket on pyzmq? If so then maybe link it here. |
It is failing in libzmq when trying to allocate the memory needed. I have seen this before when there is available system memory, but it is fragmented (thus not enough available in one spot to allocate continuously for the requested size). There is an issue open on pyzmq currently (zeromq/pyzmq#1555), but it was also opened by @RichardLions and has no replies from anyone else that may have seen this. On my end I'll need to investigate with a memory profiler to see what is filling up (or fragmenting) the available memory. I'll see how much time my project owner will let me spend debugging this and post back here if I find anything. It could be as simple as we somehow created a small memory leak in our python code. I usually write in C# so I am not sure if that is a common occurrence in python, but seems like a possibility. |
Memory leaks are not really a common occurence no, and since other people have encountered this issue it is pretty likely there is a real bug here. Good luck, and let us know if you find the issue or a workaround! |
One possibility is of course that locust is (for some reason) attempting to send a very (very) big message and that exceeds some limit on windows. |
@RichardLions Good job updating the other bug (zeromq/pyzmq#1555) and finding a possible cause. Silly me, I thought the error (out of memory) could be something to do with running out of memory. :) I did try to run some python memory profilers but all I saw was a very flat memory allocation over time and nothing alarming. |
@RichardLions has a pull request that fixes this in the pyzmq project. I implemented it manually and tested and it works. See zeromq/pyzmq#1555 Pull Request: zeromq/pyzmq#1560 |
Lets close this when there is a new release of pyzmq including the fix and we have bumped the dependency in locust. |
pyzmq 22.2.1 has been released containing the fix for this issue. |
…x-windows-OOM-issue Bump dependency on pyzmq to fix #1372 (OOM on windows)
Thanks @RichardLions ! |
Hi !
Describe the bug
An out of memory error occurs with ZeroMQ trying to allocate a crazy amount of memory in decoded_allocator, sometime up to several petabytes. This might very well be a ZeroMQ bug :
OUT OF MEMORY (bundled\zeromq\src\decoder_allocators.cpp:89)
I added some logs and recompiled pyzmq to check what's going on. Upon further investigation, _max_counters seems to take a crazy value at some point. See zmq_logs.txt
As you can see, allocator instance 0x0000016A9270F700 is constructed with _max_counters=249, but before crash its value has changed to 1557249601288, which causes a malloc of several terabytes.
Steps to reproduce
Sorry, I couldn't find a surefire way to reproduce this one. It seems kind of random. It sometime happens before the test is even started, sometime when the test is stopped. Sometime it doesn't happen at all. It does seem to happen more often when stopping a test in the web UI. Simply run the ps1 attached and do some stuff in the web UI.
Environment
I managed to repro the bug on two computers : my work computer and my personal computer. Both are on Windows 10/Python 3.6 that comes with VS2017, but my personal computer has a pristine python environent, just ran pip install locustio.
Am I doing something I'm not supposed to ?
The text was updated successfully, but these errors were encountered: