-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Callable memory estimate #221
base: master
Are you sure you want to change the base?
Conversation
Welcome to Codecov 🎉Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment. Thanks for integrating Codecov - We've got you covered ☂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BradGreig, this is very promising. Unfortunately I think the fact that it is not "automatic" is something we just have to accept -- I can't think of a non-brittle way to do this. On the other hand, I think the python memory part of it can be a little more automatic, as I commented in-line.
As for tests -- I wonder if it'd be possible to write a test in which we actually run a lightcone/coeval and somehow measure the memory with some tool, and compare to the estimate? In fact, we could even try doing this for all the integration tests, so that we test all the different options!
src/py21cmfast/_memory.py
Outdated
# First, calculate the memory usage for the initial conditions | ||
memory_ics = mem_initial_conditions(user_params=user_params) | ||
|
||
memory_data = {"ics_%s" % k: memory_ics["%s" % k] for k in memory_ics.keys()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instaed of "%s"%k
, can just have k
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I may have just been copy pasting that from something else. You are 100 per cent correct of course!
src/py21cmfast/_memory.py
Outdated
if flag_options.PHOTON_CONS: | ||
# First need to create new structs for photon the photon-conservation | ||
astro_params_photoncons = deepcopy(astro_params) | ||
astro_params_photoncons._R_BUBBLE_MAX = astro_params.R_BUBBLE_MAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? Does it not get copied automatically in the deepcopy
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this exactly from how it's done in the run_lightcone
function. Didn't check too closely, but seemed a little odd to me too. I'll double check
src/py21cmfast/_memory.py
Outdated
"""All declared HII_DIM boxes""" | ||
# lowres_density, lowres_vx, lowres_vy, lowres_vz, lowres_vcb | ||
# lowres_vx_2LPT, lowres_vy_2LPT, lowres_vz_2LPT | ||
num_py_boxes_HII_DIM = 8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I think we can do better, after we get #220 in. That introduces an update where each class can return a dict of all arrays and their shapes, before being instantiated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds pretty nice and will be very useful for these functions. This and #220 probably will need to be done hand in hand.
src/py21cmfast/_memory.py
Outdated
# These are all float arrays | ||
size_py = (np.float32(1.0).nbytes) * size_py | ||
|
||
"""Memory usage within GenerateICs""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer using comments rather than string literals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Visually I think I do this to break things up more clearly. But, I'll change them
src/py21cmfast/_memory.py
Outdated
return {"python": 0.0, "c": size_c} | ||
|
||
|
||
def format_output( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps this function name should be something like print_memory_estimate
or something like that, and it should be imported into the top-level namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I threw this together in a few minutes at the end of the day. I tried to use the inbuilt logging rather than printing but it didn't work (hence the printing). But I didn't try too hard to fix the issue
Hey @steven-murray, thanks for the quick feedback. With respect to the tests, I think that would be great. However, I wonder how accurate that would be. Different machines/compilers might allocate different amounts of memory. Thus, these functions relative to actual usage may only be approximate. You might have to invoke some level of tolerance (maybe 10 per cent) to roughly get the two numbers matching close enough to pass the test. I'm about to generate some runs to test how these functions perform relative to actual usage. Also, to test how they scale with |
@BradGreig: perfect. Yeah, we'll put in a fairly wide tolerance. The idea would just be that we should capture the broad trends. It would be unusual for someone to require the exact memory to within 10%. |
Hey @steven-murray, I have been playing around with comparing the light-cone memory function to actual usage. For larger boxes it seems to do relatively well (still testing that though), however, for the smaller boxes it ends up being an under-estimate. Primarily the under-estimate comes from the fact that for small Therefore, I'm not sure one can simply have tests with a relatively large tolerance to capture this for the entire test suite. |
Hey @BradGreig, I guess that makes some sense. Two options: 1) capture at least the interpolation tables in the memory function (not sure how difficult that is). 2) Just report in the |
Hey @steven-murray, yeah, it is possible to capture the interpolation tables. It's straightforward, it's just that those weren't relevant for the main usage of this (trying to fit large boxes into memory etc.). But, I'll go away and add those. Oh by the way, another reason why I wasn't tracking these tables is that their size is defined by a C-side variable that is not accessible to Python (i.e. not in Upon further research into memory recording etc. (with help from @qyx268) one thing I am realising is that accurately determining memory usage of processes is non-trivial and potentially inaccurate. It's ok for a single threaded process, but one could run into issues with multi-threading (e.g. what is considered shared, private etc.). Also, we have no control if and at what point swap memory may be used. Further, we can't guarantee that a compiler is being intelligent in how it is allocating/using page memory. Thus I think dynamically estimating the memory and comparing against these functions is fraught with danger. I think one can do this with valgrind, but valgrind is super slow, so not going to consider that. Thus, what I think is the best way to think of this is that these functions provide an upper-limit on the physical memory required to perform a calculation. That is, provided you have the physical memory (or very close to it), then the calculation should be able to proceed without it crashing for the lack of memory. This to me makes sense. As an example (in the attached file), I am showing the tracked physical memory for a full light-cone (no minihaloes, but everything else on). This is for a |
Hey @BradGreig, that sounds perfectly reasonable. Yes, I don't think we should be trying to predict the behaviour of the OS! If we do add the interpolation tables, we can properly call it an upper limit, and I think that's the most useful thing we can do. Should be able to test that as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably use similar name as estimate_memory_lightcone
for e.g. mem_initial_conditions
, mem_perturb_field
, and etc. so users can also just run e.g. estimate_memory_initial_conditions
if needed.
src/py21cmfast/_memory.py
Outdated
"""All declared 2LPT boxes (DIM)""" | ||
# phi_1 (6 components) | ||
if global_params.SECOND_ORDER_LPT_CORRECTIONS: | ||
num_c_boxes += 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a reminder to change this to 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this will be changed as the various PRs are merged
src/py21cmfast/_memory.py
Outdated
/ np.log(global_params.DELTA_R_HII_FACTOR) | ||
) | ||
) + 1 | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you reckon we should do the same for minihalo filtering if MINIMIZE_MEMORY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also be updated once #220 is merged.
) | ||
|
||
# dxheat_dt_box, dxion_source_dt_box, dxlya_dt_box, dstarlya_dt_box | ||
num_c_boxes_initialised += 2.0 * 4.0 # factor of 2. as these are doubles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we reduce it to float32 then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these should stay as double for numerical accuracy
Hey @qyx268, while individual functions can be easily implemented, I'm not really sure how useful they'll be. I can see some value for But I guess it can be added if it is deemed important and useful enough. |
…n to output information
…te of unaccounted for usage (e.g declared variables)
Hey @steven-murray, following the completion of #220, I just want to clarify something. In the default case (i.e. |
Hi @BradGreig, no the memory is not quite the same for
This should in general reduce the peak memory. Also, each box now only initializes the arrays that it actually needs, given the input parameters (so for example, the You can determine the exact OUTPUT memory required for each kind of output more easily now as well (without actually allocating it all). If you have an instantiated output object, you can use init = p21c.outputs.InitialConditions(user_params={...}, cosmo_params={...})
print(init._array_structure) Just to be clear, creating the So you can easily use this dictionary to compute the total output size of the struct. You'll just then need to add all the arrays that are allocated INSIDE the function, and of course any of the I think that covers all the bases? Lmk if you need any further info! |
Hey @steven-murray, awesome thanks for the run-down. Just needed a quick idea to know what I was going to have to deal with, and how much time I might need to set aside. Seems like it might be a bit more involved than I was thinking, but good to know beforehand. |
Hey @steven-murray once #320 gets merged in I might look at finally completing this PR. Not that it actually requires #320, but I think it'll be a clean/stable base to use. |
@BradGreig that'd be great! |
Hey @steven-murray, so I've tried to update all the memory functions I had following all the recent changes. But I've noticed a couple of issues, not sure if it is my interpretation or something else is not quite right. To be honest I didn't look too closely at the behaviour of the functions. As of now, the functions return estimates of the necessary memory for various options. However, they do not resemble what I find when performing an actual run. The primary issue appears to be with the purging (either it doesn't behave like it should, or I'm misinterpreting its behaviour). Basically the memory functions return a close match for the peak memory. But it doesn't remotely match the ongoing memory throughout the calculation. For example, under the logic you described above (regarding purging) after the perturbed fields are computed the ICs are removed from memory. Thus, the peak memory will always occur during the IC generation and/or with determining the perturb fields. However, what I find in practice is that the memory only every increases throughout a run. It never demonstrates a significant drop following a purge of the ICs. Perhaps the compiler doesn't properly release the memory? Thus once it is allocated it never gets properly released to be used again. This behaviour will result in an underestimate of the peak memory usage as with purging it's assumed that the peak will happen during IC generation (by the nature of its behaviour). However, in practise since it appears this memory isn't completely removed, that original memory plus the memory of everything else results in a larger peak. Secondly, if |
@BradGreig what are you using to capture the memory usage? I know sometimes the OS reserves memory for an application even after it has deallocated, just in case it needs it again. I think there's some distinction in the kind of memory allocated by the OS for different purposes, and I can never quite remember which kind of RAM estimate to use. Certainly the idea is that once we do perturbed fields, the initial conditions should be dropped from memory, so you should get lower current memory usage then. Keeping all the perturbed field boxes in memory sounds like a terrible idea, so I'll try fixing that. |
Hey @steven-murray, in this case I was scraping the information using the command line from I think storing all perturb's has some merit, but I think it is highly dependent on the actual setup. Which should be able to be determined within the function. So if possible you probably want to allow both options. For example, if the memory requirements of the number of perturbs is less than the ICs, then this is preferable. Otherwise, if the number of redshifts is too large, then only having two perturbs at any one time makes more sense. |
Hey @steven-murray I've gone through and added a function to estimate the memory usage for a light-cone. Co-eval will follow a similar structure.
Outputting of information is still a work in progress (currently done with prints), but the main functionality is there.
I'll need to think about how best to check that these estimates actually match full runs of 21cmFAST. At the moment, the only thing I can think of is to check against current memory usage from my machine for an equivalent run. I guess this is the downside for these functions (that they are not robust to changes anywhere and are reliant on me getting the behaviour correct).
Also, for the same reason, coming up for tests are problematic as these tests will always pass unless someone manually changes these functions.
I'd greatly appreciate your thoughts/comments/feedback on what I have thus far.
Fixes #26