Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output Struct Overhaul #445

Open
wants to merge 9 commits into
base: v4-prep
Choose a base branch
from
Open

Conversation

steven-murray
Copy link
Member

@steven-murray steven-murray commented Dec 4, 2024

Summary

This changes the output structure interface to be more simple and streamlined.

It is quite a comprehensive set of changes that touch a lot of things on the python-side. I'll try to list as many as I can here for easy reference:

Arrays and Backend mapping

  • New arrays.py module that implements an Array object. This object knows about the shape and dtype of an array, without necessarily having it instantiated, but also knows how to instantiate it, pass it to C, and keeps track of the ArrayState.

OutputStructs

  • The OutputStruct is now an attrs class. More importantly, all of the arrays that it needs to handle are defined directly on the class as Array parameters, making it easier to track them.
  • Each output struct now has a .new() classmethod that instantiates it from an InputParameters object, getting the shape/dtype info (and which arrays need to be present) from the inputs.
  • The downside to the above way of managing the C/Python/Disk interface with Array objects is that the attributes of the OutputStruct are no longer numpy arrays, and so you can't do for example np.mean(ics.lowres_density) any more. This is smoothed over a bit by new get() and set()methods specifically for the arrays, so you can donp.mean(ics.get('lowres_density'))`. This has the added advantage of transparently loading the array from disk if it exists there. Note that on a Coeval object, any field of any OutputStruct can be accessed directly via attribute name, as an array.
  • I've also taken all the caching and I/O management out of the OutputStruct class, instead moving it to the new io subpackage.
  • There's a new _compat_hash attribute on each OutputStruct that tells it the level of input-hash required.

Caching / IO of single-fields (OutputStruct)

  • The new io.caching module implements classes/functions for dealing with the cache. I think this is a bit more intuitive than in previous versions.
  • The OutputCache object has methods for introspecting a particular cache (defined by some directory the user gives at runtime) and reading/writing OutputStructs to it.
  • The RunCache manages full runs (i.e. all boxes belonging to a full redshift-evolved simulation), allowing simple determination of which cache files are present, and which haven't yet been run (useful for checkpointing).
  • The CacheConfig class simply defines a namespace for defining which boxes to write to cache during a larger run (coeval/lightcone).
  • The cache_tools module has been removed as it is redundant with the above module.
  • All the reading/writing of HDF5 boxes has moved to io/h5.py, and so is separated from the OutputStruct class definitions themselves. This might facilitate implementing different cache formats in the future. The file format is also slightly different (I think it's slightly better now -- the format is specified in the docstring of the module, so you can check).
  • There is also a mechanism now for being able to read files written by older versions of the code, so we can maintain explicit backwards compatibility with older outputs.

Single-Field Computations

  • The single_field module is a lot simpler. I have moved most of the boiler-plate logic to a class-style decorator in _param_config.
  • This new decorator checks redshift consistency, input parameter consistency, manages the cache and sets the current redshift appropriately given all inputs.

Lightcone / Coeval

  • I refactored some re-used code in run_coeval and run_lightcone into a set of external functions: evolve_perturb_halos and _redshift_loop_generator.
  • The Coeval and Lightcone objects are much more slim now. I removed the ability to "gather" the cached files associated with a coeval/lc, instead relying on the improved caching module to let people deal with their full-run caches.
  • Also, to read a Coeval/Lightcone you do Coeval.from_fileinstead of Coeval.read() which I think is more intuitive.

Configuration

  • I actually think we should generally move away from package-wide configuration, because it always causes trouble. I haven't removed the module itself here because it's slightly outside the scope of the PR, but I did remove the "regenerate" and "write" configuration options, and removed all places where the config was used.
  • We will have to think about how to re-implement all the functionality we had in the config (e.g. number of sigfigs for the cache). Probably most of this can be put directly into new objects (like the CacheConfig).

Other Stuff

  • I've removed any documentation or caching references to "global params". These are now to be treated as almost purely read-only (and we should move towards them being completely removed soon).
  • I moved the definition of InputParameters from param_config to inputs just because I was getting circular imports.

Meta-info:

  • These changes break strict backwards-compatibility

Issues Solved

@steven-murray steven-murray marked this pull request as ready for review December 14, 2024 00:51
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Dec 24, 2024

Codecov Report

Attention: Patch coverage is 78.69393% with 323 lines in your changes missing coverage. Please review.

Project coverage is 76.88%. Comparing base (5930245) to head (6b894c8).
Report is 3 commits behind head on v4-prep.

Files with missing lines Patch % Lines
src/py21cmfast/io/caching.py 54.43% 74 Missing and 3 partials ⚠️
src/py21cmfast/wrapper/outputs.py 82.63% 47 Missing and 19 partials ⚠️
src/py21cmfast/io/h5.py 71.52% 31 Missing and 12 partials ⚠️
src/py21cmfast/drivers/_param_config.py 82.94% 19 Missing and 10 partials ⚠️
src/py21cmfast/drivers/coeval.py 79.85% 21 Missing and 6 partials ⚠️
src/py21cmfast/wrapper/inputs.py 84.96% 15 Missing and 8 partials ⚠️
src/py21cmfast/wrapper/arrays.py 75.00% 9 Missing and 7 partials ⚠️
src/py21cmfast/drivers/lightcone.py 90.35% 5 Missing and 6 partials ⚠️
src/py21cmfast/drivers/single_field.py 85.33% 7 Missing and 4 partials ⚠️
src/py21cmfast/cli.py 55.55% 4 Missing ⚠️
... and 6 more
Additional details and impacted files
@@             Coverage Diff             @@
##           v4-prep     #445      +/-   ##
===========================================
- Coverage    79.56%   76.88%   -2.69%     
===========================================
  Files           24       27       +3     
  Lines         3803     3747      -56     
  Branches       647      611      -36     
===========================================
- Hits          3026     2881     -145     
- Misses         558      648      +90     
+ Partials       219      218       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant