Callable memory estimate #221

BradGreig · 2021-03-26T06:06:48Z

Hey @steven-murray I've gone through and added a function to estimate the memory usage for a light-cone. Co-eval will follow a similar structure.

Outputting of information is still a work in progress (currently done with prints), but the main functionality is there.

I'll need to think about how best to check that these estimates actually match full runs of 21cmFAST. At the moment, the only thing I can think of is to check against current memory usage from my machine for an equivalent run. I guess this is the downside for these functions (that they are not robust to changes anywhere and are reliant on me getting the behaviour correct).

Also, for the same reason, coming up for tests are problematic as these tests will always pass unless someone manually changes these functions.

I'd greatly appreciate your thoughts/comments/feedback on what I have thus far.

Fixes #26

codecov · 2021-03-26T06:26:58Z

Welcome to Codecov 🎉

Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment.

Thanks for integrating Codecov - We've got you covered ☂️

steven-murray

Thanks @BradGreig, this is very promising. Unfortunately I think the fact that it is not "automatic" is something we just have to accept -- I can't think of a non-brittle way to do this. On the other hand, I think the python memory part of it can be a little more automatic, as I commented in-line.

As for tests -- I wonder if it'd be possible to write a test in which we actually run a lightcone/coeval and somehow measure the memory with some tool, and compare to the estimate? In fact, we could even try doing this for all the integration tests, so that we test all the different options!

steven-murray · 2021-03-26T16:49:25Z

src/py21cmfast/_memory.py

+    # First, calculate the memory usage for the initial conditions
+    memory_ics = mem_initial_conditions(user_params=user_params)
+
+    memory_data = {"ics_%s" % k: memory_ics["%s" % k] for k in memory_ics.keys()}


instaed of "%s"%k, can just have k.

Yep, I may have just been copy pasting that from something else. You are 100 per cent correct of course!

steven-murray · 2021-03-26T16:50:54Z

src/py21cmfast/_memory.py

+    if flag_options.PHOTON_CONS:
+        # First need to create new structs for photon the photon-conservation
+        astro_params_photoncons = deepcopy(astro_params)
+        astro_params_photoncons._R_BUBBLE_MAX = astro_params.R_BUBBLE_MAX


Why is this necessary? Does it not get copied automatically in the deepcopy??

I copied this exactly from how it's done in the run_lightcone function. Didn't check too closely, but seemed a little odd to me too. I'll double check

steven-murray · 2021-03-26T16:55:44Z

src/py21cmfast/_memory.py

+    """All declared HII_DIM boxes"""
+    # lowres_density, lowres_vx, lowres_vy, lowres_vz, lowres_vcb
+    # lowres_vx_2LPT, lowres_vy_2LPT, lowres_vz_2LPT
+    num_py_boxes_HII_DIM = 8.0


Here, I think we can do better, after we get #220 in. That introduces an update where each class can return a dict of all arrays and their shapes, before being instantiated.

Yeah, that sounds pretty nice and will be very useful for these functions. This and #220 probably will need to be done hand in hand.

steven-murray · 2021-03-26T16:56:26Z

src/py21cmfast/_memory.py

+    # These are all float arrays
+    size_py = (np.float32(1.0).nbytes) * size_py
+
+    """Memory usage within GenerateICs"""


Prefer using comments rather than string literals

Visually I think I do this to break things up more clearly. But, I'll change them

steven-murray · 2021-03-26T16:59:51Z

src/py21cmfast/_memory.py

+    return {"python": 0.0, "c": size_c}
+
+
+def format_output(


I think perhaps this function name should be something like print_memory_estimate or something like that, and it should be imported into the top-level namespace.

Yep, I threw this together in a few minutes at the end of the day. I tried to use the inbuilt logging rather than printing but it didn't work (hence the printing). But I didn't try too hard to fix the issue

BradGreig · 2021-03-29T05:40:53Z

Hey @steven-murray, thanks for the quick feedback.

With respect to the tests, I think that would be great. However, I wonder how accurate that would be. Different machines/compilers might allocate different amounts of memory. Thus, these functions relative to actual usage may only be approximate. You might have to invoke some level of tolerance (maybe 10 per cent) to roughly get the two numbers matching close enough to pass the test.

I'm about to generate some runs to test how these functions perform relative to actual usage. Also, to test how they scale with HII_DIM as if I get the scaling correct that gives me confidence that the memory is being accurately captured by these functions.

steven-murray · 2021-03-29T17:15:17Z

@BradGreig: perfect. Yeah, we'll put in a fairly wide tolerance. The idea would just be that we should capture the broad trends. It would be unusual for someone to require the exact memory to within 10%.

BradGreig · 2021-03-30T06:58:33Z

Hey @steven-murray, I have been playing around with comparing the light-cone memory function to actual usage. For larger boxes it seems to do relatively well (still testing that though), however, for the smaller boxes it ends up being an under-estimate.

Primarily the under-estimate comes from the fact that for small HII_DIM memory declaration of some small interpolation tables and other variables (which aren't included in this function) are comparable to the 3D boxes themselves. For increasing HII_DIM this becomes less of a problem as these become dwarfed by the 3D boxes.

Therefore, I'm not sure one can simply have tests with a relatively large tolerance to capture this for the entire test suite.

steven-murray · 2021-03-30T16:15:22Z

Hey @BradGreig, I guess that makes some sense. Two options: 1) capture at least the interpolation tables in the memory function (not sure how difficult that is). 2) Just report in the print_ function that this is a lower estimate if HII_DIM < some_number, and when we test, just test that the estimate is below the actual memory usage (since we essentially only test small boxes).

BradGreig · 2021-03-31T04:15:58Z

Hey @steven-murray, yeah, it is possible to capture the interpolation tables. It's straightforward, it's just that those weren't relevant for the main usage of this (trying to fit large boxes into memory etc.). But, I'll go away and add those. Oh by the way, another reason why I wasn't tracking these tables is that their size is defined by a C-side variable that is not accessible to Python (i.e. not in global_params). I don't think those variables will ever change though.

Upon further research into memory recording etc. (with help from @qyx268) one thing I am realising is that accurately determining memory usage of processes is non-trivial and potentially inaccurate. It's ok for a single threaded process, but one could run into issues with multi-threading (e.g. what is considered shared, private etc.). Also, we have no control if and at what point swap memory may be used. Further, we can't guarantee that a compiler is being intelligent in how it is allocating/using page memory. Thus I think dynamically estimating the memory and comparing against these functions is fraught with danger. I think one can do this with valgrind, but valgrind is super slow, so not going to consider that.

Thus, what I think is the best way to think of this is that these functions provide an upper-limit on the physical memory required to perform a calculation. That is, provided you have the physical memory (or very close to it), then the calculation should be able to proceed without it crashing for the lack of memory. This to me makes sense.

As an example (in the attached file), I am showing the tracked physical memory for a full light-cone (no minihaloes, but everything else on). This is for a DIM=525, HII_DIM=175 box, saving 4 light-cones. The horizontal dashed line corresponds to the peak physical memory required from my function, the blue curve is the physical memory output using ps. The coloured patches denote which section the code is in, grey = ICs, orange = perturbed field, green = spin temperature and purple=ionisation field. As you can see, the actual physical memory is always below my estimate. I don't anticipate the interpolation tables making up the difference, thus I am confident the function serves it's purpose as providing the upper-limit. It appears to be correct for smaller mini-halo boxes too.

MemoryUsageOutput.pdf

steven-murray · 2021-03-31T19:13:34Z

Hey @BradGreig, that sounds perfectly reasonable. Yes, I don't think we should be trying to predict the behaviour of the OS! If we do add the interpolation tables, we can properly call it an upper limit, and I think that's the most useful thing we can do. Should be able to test that as well.

…d array)

qyx268

probably use similar name as estimate_memory_lightcone for e.g. mem_initial_conditions, mem_perturb_field, and etc. so users can also just run e.g. estimate_memory_initial_conditions if needed.

qyx268 · 2021-04-08T07:21:14Z

src/py21cmfast/_memory.py

+    """All declared 2LPT boxes (DIM)"""
+    # phi_1 (6 components)
+    if global_params.SECOND_ORDER_LPT_CORRECTIONS:
+        num_c_boxes += 6


just a reminder to change this to 1

Yep, this will be changed as the various PRs are merged

qyx268 · 2021-04-08T07:23:12Z

src/py21cmfast/_memory.py

+                / np.log(global_params.DELTA_R_HII_FACTOR)
+            )
+        ) + 1
+    else:


would you reckon we should do the same for minihalo filtering if MINIMIZE_MEMORY?

This will also be updated once #220 is merged.

qyx268 · 2021-04-08T07:24:34Z

src/py21cmfast/_memory.py

+        )
+
+        # dxheat_dt_box, dxion_source_dt_box, dxlya_dt_box, dstarlya_dt_box
+        num_c_boxes_initialised += 2.0 * 4.0  # factor of 2. as these are doubles


shall we reduce it to float32 then?

No, these should stay as double for numerical accuracy

BradGreig · 2021-04-09T03:50:42Z

Hey @qyx268, while individual functions can be easily implemented, I'm not really sure how useful they'll be.

I can see some value for initial_conditions, but for the remaining I don't see the benefit of having individual functions. Particularly given all the requisite information will be provided by a run_coeval or run_lightcone.

But I guess it can be added if it is deemed important and useful enough.

…ight-cone

…n to output information

…te of unaccounted for usage (e.g declared variables)

BradGreig · 2021-06-15T07:35:34Z

Hey @steven-murray, following the completion of #220, I just want to clarify something. In the default case (i.e. MINIMIZE_MEMORY=False) does the memory usage remains as it was prior to #220? If so (and I am assuming it is), then it should only be a matter of adding a MINIMIZE_MEMORY=True output to what I have already done.

steven-murray · 2021-06-15T16:29:21Z

Hi @BradGreig, no the memory is not quite the same for run_coeval or run_lightcone, because now the initial conditions (at least in most cases) are purged from memory before doing anything after the perturb field. So the general flow of these functions is now:

compute ICs
purge hires or lowres boxes from ICs depending on PERTURB_ON_HIGH_RES
compute all necessary PFs, then if write=True, purge them from memory
purge the whole ICs (unless they don't exist on disk -- then we're conservative and keep them around)
compute spin_temp/ionize etc. for all redshifts
save the whole coeval/lightcone (which may involve reading in some of the earlier boxes that were purged)

This should in general reduce the peak memory. Also, each box now only initializes the arrays that it actually needs, given the input parameters (so for example, the _2LPT arrays in the ICs are only allocated if USE_2LPT=True). This will reduce the memory in most cases, not just the peak memory.

You can determine the exact OUTPUT memory required for each kind of output more easily now as well (without actually allocating it all). If you have an instantiated output object, you can use _array_structure to get the various box sizes, eg:

init = p21c.outputs.InitialConditions(user_params={...}, cosmo_params={...})
print(init._array_structure)

Just to be clear, creating the init here does not actually allocate any arrays -- these are allocated only as required just before passing the object into C. The _array_structure is a dictionary of keys corresponding to arrays, and then the values are usually tuples giving the array shape, but are sometimes dicts themselves, with entries "shape" and "init", where the "init" is a function used to initialize the array (by default this is np.zeros but for some arrays it has to be np.ones).

So you can easily use this dictionary to compute the total output size of the struct. You'll just then need to add all the arrays that are allocated INSIDE the function, and of course any of the _c_based_pointers in the struct you can't know their size before they are returned from C (atm this is limited to the halo field).

I think that covers all the bases? Lmk if you need any further info!

BradGreig · 2021-06-16T01:55:31Z

Hey @steven-murray, awesome thanks for the run-down. Just needed a quick idea to know what I was going to have to deal with, and how much time I might need to set aside. Seems like it might be a bit more involved than I was thinking, but good to know beforehand.

for more information, see https://pre-commit.ci

BradGreig · 2023-05-04T02:46:59Z

Hey @steven-murray once #320 gets merged in I might look at finally completing this PR. Not that it actually requires #320, but I think it'll be a clean/stable base to use.

steven-murray · 2023-05-04T15:07:15Z

@BradGreig that'd be great!

BradGreig · 2023-05-18T06:47:53Z

Hey @steven-murray, so I've tried to update all the memory functions I had following all the recent changes. But I've noticed a couple of issues, not sure if it is my interpretation or something else is not quite right. To be honest I didn't look too closely at the behaviour of the functions.

As of now, the functions return estimates of the necessary memory for various options. However, they do not resemble what I find when performing an actual run.

The primary issue appears to be with the purging (either it doesn't behave like it should, or I'm misinterpreting its behaviour).

Basically the memory functions return a close match for the peak memory. But it doesn't remotely match the ongoing memory throughout the calculation. For example, under the logic you described above (regarding purging) after the perturbed fields are computed the ICs are removed from memory. Thus, the peak memory will always occur during the IC generation and/or with determining the perturb fields.

However, what I find in practice is that the memory only every increases throughout a run. It never demonstrates a significant drop following a purge of the ICs. Perhaps the compiler doesn't properly release the memory? Thus once it is allocated it never gets properly released to be used again.

This behaviour will result in an underestimate of the peak memory usage as with purging it's assumed that the peak will happen during IC generation (by the nature of its behaviour). However, in practise since it appears this memory isn't completely removed, that original memory plus the memory of everything else results in a larger peak.

Secondly, if MINIMIZE_MEMORY is False it appears all perturbed_field boxes are kept in memory at any one time. That is, for a regular light-cone (z = 5.5) that'll constitute 88 perturbed fields that are kept in memory. Is this desirable behaviour? I guess this comes about following the restructure of the code to purge the ICs. But I can see a few instances where this is far from ideal.

steven-murray · 2023-06-22T21:41:11Z

@BradGreig what are you using to capture the memory usage? I know sometimes the OS reserves memory for an application even after it has deallocated, just in case it needs it again. I think there's some distinction in the kind of memory allocated by the OS for different purposes, and I can never quite remember which kind of RAM estimate to use.

Certainly the idea is that once we do perturbed fields, the initial conditions should be dropped from memory, so you should get lower current memory usage then.

Keeping all the perturbed field boxes in memory sounds like a terrible idea, so I'll try fixing that.

BradGreig · 2023-06-26T03:20:52Z

Hey @steven-murray, in this case I was scraping the information using the command line from top during the evaluation. Additionally, I was grabbing the max memory off the HPC job monitor and submitted job.

I think storing all perturb's has some merit, but I think it is highly dependent on the actual setup. Which should be able to be determined within the function. So if possible you probably want to allow both options.

For example, if the memory requirements of the number of perturbs is less than the ICs, then this is preferable. Otherwise, if the number of redshifts is too large, then only having two perturbs at any one time makes more sense.

BradGreig added 11 commits March 25, 2021 12:00

Lay down some initial structure for memory functions

3787f70

Finalise ICs memory function

06617df

Add function for perturb_field

6860415

Added memory function for ionize_box

42697c9

Added memory function for the spin temperature calculation

43a6e89

Added memory function for brightness temperature

8e4f6af

Add memory function for determine_halo_list

89f6269

Added function for perturbed halo field

b9423bc

Add in code to calculate the size of all stored lightcones

71f24c0

Add in the calculation of the memory for the lightcone function

1b9355d

Add some output messaging for the lightcone function

2ecdb64

steven-murray reviewed Mar 26, 2021

View reviewed changes

BradGreig added 2 commits April 8, 2021 16:17

Some minor modifications (formatting) following Steven's feedback

15a2b3c

Add interpolation table data to IonisationBox (+modify double declare…

ff4b86a

…d array)

qyx268 approved these changes Apr 8, 2021

View reviewed changes

BradGreig added 7 commits April 9, 2021 15:00

Add in some final interpolation tables

f8045cc

Add in memory from generating a large number of ints (shuffling in RNG)

4bcc750

Add in recombination tables

7ed89f1

Add some description to the function to estimate memory usage for a l…

b04faa2

…ight-cone

Add memory function for running a coeval box

698b14b

Added function for estimating ics and perturb field. Modified functio…

8505fc0

…n to output information

Add a buffer to the peak memory usage corresponding to a rough estima…

4ba2907

…te of unaccounted for usage (e.g declared variables)

steven-murray assigned steven-murray and BradGreig and unassigned steven-murray Apr 7, 2022

steven-murray and others added 3 commits September 15, 2022 12:16

main: add gitignore

97c3d8a

Merge branch 'master' into callable-memory-estimate

e956f10

[pre-commit.ci] auto fixes from pre-commit.com hooks

b34b114

for more information, see https://pre-commit.ci

BradGreig mentioned this pull request May 16, 2023

Non cubic #289

Merged

BradGreig added 7 commits May 17, 2023 11:48

Merge branch 'master' into callable-memory-estimate

3c193c5

Update ICs memory function

d715461

Update the perturb field function

303657e

Update ionisation box memory

51c4d24

Update the memory function for spin temperature

cfe401f

Update brightness temp, and halo functions

a25394d

Some general cleanup

8ed4430

steven-murray mentioned this pull request May 25, 2023

Installing on Ubuntu 18.04 LTS #190

Closed

BradGreig mentioned this pull request Jun 26, 2023

[BUG] Memory issues in 21cmFast #337

Open

Merge branch 'master' into callable-memory-estimate

8c586f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callable memory estimate #221

Callable memory estimate #221

BradGreig commented Mar 26, 2021 •

edited by steven-murray

Loading

codecov bot commented Mar 26, 2021 •

edited

Loading

steven-murray left a comment

steven-murray Mar 26, 2021

BradGreig Mar 29, 2021

steven-murray Mar 26, 2021

BradGreig Mar 29, 2021

steven-murray Mar 26, 2021

BradGreig Mar 29, 2021

steven-murray Mar 26, 2021

BradGreig Mar 29, 2021

steven-murray Mar 26, 2021

BradGreig Mar 29, 2021

BradGreig commented Mar 29, 2021

steven-murray commented Mar 29, 2021

BradGreig commented Mar 30, 2021

steven-murray commented Mar 30, 2021

BradGreig commented Mar 31, 2021 •

edited

Loading

steven-murray commented Mar 31, 2021

qyx268 left a comment

qyx268 Apr 8, 2021

BradGreig Apr 9, 2021

qyx268 Apr 8, 2021

BradGreig Apr 9, 2021

qyx268 Apr 8, 2021

BradGreig Apr 9, 2021

BradGreig commented Apr 9, 2021

BradGreig commented Jun 15, 2021

steven-murray commented Jun 15, 2021

BradGreig commented Jun 16, 2021

BradGreig commented May 4, 2023

steven-murray commented May 4, 2023

BradGreig commented May 18, 2023

steven-murray commented Jun 22, 2023

BradGreig commented Jun 26, 2023

Callable memory estimate #221

Are you sure you want to change the base?

Callable memory estimate #221

Conversation

BradGreig commented Mar 26, 2021 • edited by steven-murray Loading

codecov bot commented Mar 26, 2021 • edited Loading

Welcome to Codecov 🎉

steven-murray left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BradGreig commented Mar 29, 2021

steven-murray commented Mar 29, 2021

BradGreig commented Mar 30, 2021

steven-murray commented Mar 30, 2021

BradGreig commented Mar 31, 2021 • edited Loading

steven-murray commented Mar 31, 2021

qyx268 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BradGreig commented Apr 9, 2021

BradGreig commented Jun 15, 2021

steven-murray commented Jun 15, 2021

BradGreig commented Jun 16, 2021

BradGreig commented May 4, 2023

steven-murray commented May 4, 2023

BradGreig commented May 18, 2023

steven-murray commented Jun 22, 2023

BradGreig commented Jun 26, 2023

BradGreig commented Mar 26, 2021 •

edited by steven-murray

Loading

codecov bot commented Mar 26, 2021 •

edited

Loading

BradGreig commented Mar 31, 2021 •

edited

Loading