Output Struct Overhaul #445

steven-murray · 2024-12-04T13:01:43Z

Summary

This changes the output structure interface to be more simple and streamlined.

It is quite a comprehensive set of changes that touch a lot of things on the python-side. I'll try to list as many as I can here for easy reference:

Arrays and Backend mapping

New arrays.py module that implements an Array object. This object knows about the shape and dtype of an array, without necessarily having it instantiated, but also knows how to instantiate it, pass it to C, and keeps track of the ArrayState.

OutputStructs

The OutputStruct is now an attrs class. More importantly, all of the arrays that it needs to handle are defined directly on the class as Array parameters, making it easier to track them.
Each output struct now has a .new() classmethod that instantiates it from an InputParameters object, getting the shape/dtype info (and which arrays need to be present) from the inputs.
The downside to the above way of managing the C/Python/Disk interface with Array objects is that the attributes of the OutputStruct are no longer numpy arrays, and so you can't do for example np.mean(ics.lowres_density) any more. This is smoothed over a bit by new get() and set()methods specifically for the arrays, so you can donp.mean(ics.get('lowres_density'))`. This has the added advantage of transparently loading the array from disk if it exists there. Note that on a Coeval object, any field of any OutputStruct can be accessed directly via attribute name, as an array.
I've also taken all the caching and I/O management out of the OutputStruct class, instead moving it to the new io subpackage.
There's a new _compat_hash attribute on each OutputStruct that tells it the level of input-hash required.

Caching / IO of single-fields (OutputStruct)

The new io.caching module implements classes/functions for dealing with the cache. I think this is a bit more intuitive than in previous versions.
The OutputCache object has methods for introspecting a particular cache (defined by some directory the user gives at runtime) and reading/writing OutputStructs to it.
The RunCache manages full runs (i.e. all boxes belonging to a full redshift-evolved simulation), allowing simple determination of which cache files are present, and which haven't yet been run (useful for checkpointing).
The CacheConfig class simply defines a namespace for defining which boxes to write to cache during a larger run (coeval/lightcone).
The cache_tools module has been removed as it is redundant with the above module.
All the reading/writing of HDF5 boxes has moved to io/h5.py, and so is separated from the OutputStruct class definitions themselves. This might facilitate implementing different cache formats in the future. The file format is also slightly different (I think it's slightly better now -- the format is specified in the docstring of the module, so you can check).
There is also a mechanism now for being able to read files written by older versions of the code, so we can maintain explicit backwards compatibility with older outputs.

Single-Field Computations

The single_field module is a lot simpler. I have moved most of the boiler-plate logic to a class-style decorator in _param_config.
This new decorator checks redshift consistency, input parameter consistency, manages the cache and sets the current redshift appropriately given all inputs.

Lightcone / Coeval

I refactored some re-used code in run_coeval and run_lightcone into a set of external functions: evolve_perturb_halos and _redshift_loop_generator.
The Coeval and Lightcone objects are much more slim now. I removed the ability to "gather" the cached files associated with a coeval/lc, instead relying on the improved caching module to let people deal with their full-run caches.
Also, to read a Coeval/Lightcone you do Coeval.from_fileinstead of Coeval.read() which I think is more intuitive.

Configuration

I actually think we should generally move away from package-wide configuration, because it always causes trouble. I haven't removed the module itself here because it's slightly outside the scope of the PR, but I did remove the "regenerate" and "write" configuration options, and removed all places where the config was used.
We will have to think about how to re-implement all the functionality we had in the config (e.g. number of sigfigs for the cache). Probably most of this can be put directly into new objects (like the CacheConfig).

Other Stuff

I've removed any documentation or caching references to "global params". These are now to be treated as almost purely read-only (and we should move towards them being completely removed soon).
I moved the definition of InputParameters from param_config to inputs just because I was getting circular imports.

Meta-info:

These changes break strict backwards-compatibility

Issues Solved

for more information, see https://pre-commit.ci

review-notebook-app · 2024-12-24T00:49:35Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2024-12-24T01:10:25Z

Codecov Report

Attention: Patch coverage is 78.69393% with 323 lines in your changes missing coverage. Please review.

Project coverage is 76.88%. Comparing base (5930245) to head (6b894c8).
Report is 3 commits behind head on v4-prep.

Files with missing lines	Patch %	Lines
src/py21cmfast/io/caching.py	54.43%	74 Missing and 3 partials ⚠️
src/py21cmfast/wrapper/outputs.py	82.63%	47 Missing and 19 partials ⚠️
src/py21cmfast/io/h5.py	71.52%	31 Missing and 12 partials ⚠️
src/py21cmfast/drivers/_param_config.py	82.94%	19 Missing and 10 partials ⚠️
src/py21cmfast/drivers/coeval.py	79.85%	21 Missing and 6 partials ⚠️
src/py21cmfast/wrapper/inputs.py	84.96%	15 Missing and 8 partials ⚠️
src/py21cmfast/wrapper/arrays.py	75.00%	9 Missing and 7 partials ⚠️
src/py21cmfast/drivers/lightcone.py	90.35%	5 Missing and 6 partials ⚠️
src/py21cmfast/drivers/single_field.py	85.33%	7 Missing and 4 partials ⚠️
src/py21cmfast/cli.py	55.55%	4 Missing ⚠️
... and 6 more

Additional details and impacted files

@@             Coverage Diff             @@
##           v4-prep     #445      +/-   ##
===========================================
- Coverage    79.56%   76.88%   -2.69%     
===========================================
  Files           24       27       +3     
  Lines         3803     3747      -56     
  Branches       647      611      -36     
===========================================
- Hits          3026     2881     -145     
- Misses         558      648      +90     
+ Partials       219      218       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

steven-murray and others added 2 commits December 4, 2024 14:01

feat: half-way there

ca20357

[pre-commit.ci] auto fixes from pre-commit.com hooks

4addee1

for more information, see https://pre-commit.ci

steven-murray requested a review from daviesje December 14, 2024 00:29

feat: refactor of output structs

28bee8b

steven-murray marked this pull request as ready for review December 14, 2024 00:51

steven-murray added 5 commits December 14, 2024 01:51

merge main

adb74e2

test: make all tests work again

5d0b065

Merge branch 'v4-prep' into output-struct-overhaul

faf8494

fix: can't import lightcones because of circ dep

3b7f045

fix: typo in method update

5fd5117

docs: update coeval and lightcone tutorials

6b894c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output Struct Overhaul #445

Output Struct Overhaul #445

steven-murray commented Dec 4, 2024 •

edited

Loading

review-notebook-app bot commented Dec 24, 2024

codecov bot commented Dec 24, 2024

Output Struct Overhaul #445

Are you sure you want to change the base?

Output Struct Overhaul #445

Conversation

steven-murray commented Dec 4, 2024 • edited Loading

Arrays and Backend mapping

OutputStructs

Caching / IO of single-fields (OutputStruct)

Single-Field Computations

Lightcone / Coeval

Configuration

Other Stuff

review-notebook-app bot commented Dec 24, 2024

codecov bot commented Dec 24, 2024

Codecov Report

steven-murray commented Dec 4, 2024 •

edited

Loading