Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG-REPORT] Group By memory Issue #2022

Open
MHK107 opened this issue Apr 18, 2022 · 4 comments
Open

[BUG-REPORT] Group By memory Issue #2022

MHK107 opened this issue Apr 18, 2022 · 4 comments

Comments

@MHK107
Copy link

MHK107 commented Apr 18, 2022

Hello,

I have a project running on vaex v4.0.0, I also have it wrapped around flask to have API's running off it. I was hopping to get some help related to memory.

I do face memory leak issues while using group by here's an example.

df.groupby(['rooms_count'], agg={vx.agg.mean('price_per_meter'),vx.agg.min('price_per_meter'),vx.agg.max('price_per_meter'),vx.agg.count('price_per_meter')})

My issue is not with the amount of memory being used. But after the API call is executed the memory used is not released back to the OS. Scale it to multiple API requests and soon I am out of memory on server. I have tried using garbage collection but still the memory isn't released back to the OS.

I was asked to help replicate the issue. You can find the code and steps to replicate over there
[https://github.com/MHK107/vaex-groupby-memory-issue/tree/main](Link to the repo)

Please let me know if I can help in any way possible to replicate and resolve this

@maartenbreddels
Copy link
Member

Hi,

Is this 4.0.0 or 4.x?
Please check if this still happens with the latest version.

Cheers,

Maarten

@MHK107
Copy link
Author

MHK107 commented May 2, 2022

Hi @maartenbreddels

We updated to the latest version however we still face the same issue

@maartenbreddels
Copy link
Member

I tried to reproduce it with the repo you linked to.
On master (but vaex 4.9.1 should not be different), I hit that endpoint ('/test') many many times and the memory usage stays stable.
Could you double-check if you really get this issue, like hit it 10 times, or a 100 times and see if you see a memory increase.

@MHK107
Copy link
Author

MHK107 commented May 16, 2022

Hi @maartenbreddels

Thanks for reaching out really appreciate it. It's very interesting indeed, Just out of curiosity. Does the memory usage increase on first call and then it stays the same for next 100 requests (however many we try) ?

Let me explain it better.

Suppose before sending any calls the RAM usage is 6GB's let's assume on first API call it goes to 8GB's and it stays the same after 1000 API calls. Once the 1000th call is done does the RAM stay at 8GB or 6GB ?

I'll also discuss this with my data science team internally, We'll make sure that we are running the correct version.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants