-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryError during 'run_all_rules' execution #1249
Comments
Any chance it's Elasticsearch throws the "MemoryError" message? but then elastalert should proceed execution and not crash. this one crashes the docker completely that it can't be restarted. |
That MemoryError is coming from Python, not Elasticsearch. There are definitely some issues regarding unbounded memory usage. If you have too many documents in Elasticsearch, it's easy to shoot yourself in the foot as elastalert will try to load too much into memory. You may be running into process level memory limits which is why the server is still at low usage. Try running ulimit (depending on your system, obviously this may be a different command)
Which means that I have a limit of 6 gb of memory. You can set this to a higher number or unlimited for the elastalert process. In general I'd like to add some better sanity checking for when a query keeps scrolling and scrolling forever. max_query_size applies to single query only :( |
Hi @Qmando , thanks for another quick reply. Dude you gotta start charging for commercial support, seriously, you support Elastalert here on github better than several paid services I have that charge $ for support subscription ;) . 👍 Can I try to add a simple counter that will limit scrolling to a sane number like 15-20 as default? (with option to modify it), can you point me to, so I submit a PR? is it goes all through single function that controls this retry behavior or each rule has it separately? |
There's a quick fix for this and a more in depth fix. The quick way would be to limit Something like
Alternatively you can limit the number of scrolls. Effectively the same thing. The more in depth fix is to restructure how we do scrolling. The issue right now is that because we do recursion, we keep the data from each query in memory because the functions never return until the scroll ends. If we did it iteratively, we can forget each data frame after adding it to the ruletype. |
Interesting.. yeah, I thought to limit the number of scrolls. I'll try to use what you suggest. Do you plan to implement something like that for next version update? |
Regarding whether you need the count or the documents in full, there's three different scenarios:
It's definitely possible to make elastalert smart and choose whether or not For case 2, we could be smart and make a single query after a match occurs, similar to what happens for top_count_keys. As for what the consequences of dropping documents after some number are.. The most obvious is that you could miss alerts if you had very high thresholds, or a super high cardinality on a query_key field. |
This doesn't seem to be a very commonly reported issue, so all of the "smart" features are probably not necessary at this point, they would be a lot of work. I guess I'll try to do the refactor to make it more memory efficient at some point soon. |
Hi, thanks for the details. I understand in my case it has to fetch all results, because all our rules are with "query_key" and there are 20+ variants to the key (our different environments). |
If you have a single query_key (not a list of query_keys), you can use |
oh, ok I'll try. Thanks! |
@Qmando |
any idea of 'del data' statement? @nsano-rururu @Qmando |
Hi,
I'm having weird crashes lately all the time:
any idea what it might be?
it runs in docker. and server is barely at 10%-12% memory utilization, i see in newrelic it never was full of memory.
The crash lines point to here (in run_all_rules(self)):
The text was updated successfully, but these errors were encountered: