-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Memory issues in 21cmFast #337
Comments
Hi @jordanflitter, before being able to provide detailed answers to your question, can you please provide an example of what you are specifically aiming to compute (e.g. used variables, setup and version). That way we should be able to provide more useful information. By memory issues, do you just mean you exhaust the available memory on your machine? A brief response to some of your questions:
The decision here was based on being able to purge the initial conditions, which individually require the most memory. But, if enough redshifts are required, this won't help for the identified reason. Likely a solution implementing both will be required (where the path taken depends on the user's setup).
4 - 6. This is probably true, but the memory requirements here are essentially zero, essentially w.r.t
Not sure if they are useful answers, but hopefully more assistance can be provided with more information on your desired setup |
Thank you very much @jordanflitter for the detailed bug report. I agree with @BradGreig's comments, and I think a few of these can be fixed easily (and as he says, I'm pretty sure the first is already fixed on |
Thanks @BradGreig and @steven-murray for showing interest in this issue. After more tests I did I found that the last issue (that prevented me from completing the simulation in my project) had to do with the modifications I did in my project, rather than another bug in 21cmFast :) As for your questions, @BradGreig, I'm working with a modified code that is based on version 3.1.3. Because I saw that most of the lines that were causing me memory issues* are still found in the updated *By memory issues I mean that the program has exhausted its available memory. For high resolutions I run the code on a Sun Grid Engine (SGE) system and many of the jobs were either killed/aborted by the system (in some occasions, a numpy ArrayMemoryError was raised). Before I fixed the 2nd bug on my list I used to get a segmentation fault error. I should say that when I work with a low resolution box on my laptop I've never had any of these problems. Before fixing the 3rd bug on my list, the memory usage by 21cmFast was increasing with each redshift step (here I used a poor resolution box of HII_DIM=25, no mini-halos, and I used the standard 84 redshift samples from z=35 to z=6. I made this plot with the wonderful tool psrecord. The first blue peak in this plot corresponds to the generation of the initial boxes) Fixing the 3rd bug helps flattening the blue curve, but not entirely. Then, by fixing the 4th-6th bugs on my list I was able to bring the blue curve on a nearly** constant value. To be honest, I haven't checked what happens if the 7th bug on my list is fixed, I imagine it has a negligible effect on the total memory. **Unfortunately, after making all these changes, the used memory by the 21cmFast is still not perfectly constant, and it gradually increases with each redshift step. In an attempt to figure out what may causing it, I wrote the following short script. This script returns different memory usage by the three different generated InitialConditions objects, which I find to be weird. I have to admit though that I don't have any prior experience with the actualsize(), although it seems to be doing a better job than sys.getsizeof(). import sys
import gc
import numpy as np
import matplotlib.pyplot as plt
import py21cmfast as p21c
def actualsize(input_obj):
memory_size = 0
ids = set()
objects = [input_obj]
objects_memory = []
while objects:
new = []
for obj in objects:
if id(obj) not in ids:
ids.add(id(obj))
memory_size += sys.getsizeof(obj)
new.append(obj)
objects_memory.append(sys.getsizeof(obj))
objects = gc.get_referents(*new)
return memory_size, np.sort(objects_memory)
for i in np.arange(3):
ICs = p21c.initial_conditions(user_params={'HII_DIM': 25 },
regenerate=True,
write=False,
direc="_cache",
random_seed=1)
M, objects_memory = actualsize(ICs)
print(f"Total memory of ICs is {M}")
plt.plot(np.cumsum(objects_memory)) |
Hi @jordanflitter, excellent, good to know you were able to get your simulations complete! The usage graph you've provided looks fairly similar to one I produce a while back (#221 (comment)). Apologies for the difficulty viewing it, it's run all the way through the calculation. However, between that version, your's and the current master there have been a few changes (mainly your 2nd point, which we are actively resolving). Additionally, thanks for passing along the tools you've been using. They will be helpful (I was just scraping the information from top while the simulation was running). We'll get to work when we can with respect to implementing a lot of these suggestions. I think the dominant one will be point 2. But we'll check and see how everything else is performing. |
Thanks @BradGreig. I'm glad that you found my comment helpful. Please let me know if you find anything interesting in that context :) |
Hello,
I've encountered numerous memory issues that prevent me from completing the simulation. I should stress that in my project I work with a lot more redshift samples than in the standard 21cmFast, which is why I'm having these issues. Here's a list of the bugs that I've found so far, sorted by their severity:
Fixing all the above bugs seem to reduce the memory requirements of 21cmFast considerably. And yet, there is still something that keeps me from finishing my simulation. As far as I understand, when we get into the main redshift loop in either run_lightcone() or run_coeval(), the used memory by the program should remain on a fixed value, because new coeval boxes overwrite the previous ones at each iteration. However, this is certainly not what's happening, even when the above bugs are fixed.
To investigate what's causing this behavior, I wrote a script that calls to initial_conditions() multiples times, and prints the memory used by the returned InitialConditions object (I should note that I use the same random seed at each run). I should note that I found sys.getsizeof() to be unreliable for this task and I used instead actualsize(), which can be found in this nice article. I was surprised to find that the memory used by each InitialConditions object was not constant, but rather growing with every call to initial_conditions(). Not only that, but the growth rate was not constant. I'm suspecting that it might have to do with the basic class "StructWrapper" that is used by the wrapper, but I am not sure. This is the point where I thought I should get some help from the github community :)
Thanks for anyone that can help me with that issue!
Jordan
The text was updated successfully, but these errors were encountered: