Try to avoid copying data models. #774

schlafly · 2023-07-05T21:17:12Z

This PR tries to reduce memory usage in romancal. My approach has been pretty aggressive so far and should be reined in. I have focused on the steps preceding jump detection and ramp fitting, which we are replacing. See
spacetelescope/romanisim#64
for a memory efficient ramp fitting approach. A jump detection approach following plans for Roman based around that routine should also be very memory efficient.

I have been using this is conjunction with
spacetelescope/roman_datamodels#232
to reduce memory usage. On my memray runs today, up until jump detection the new version uses 3.5 GB in comparison to 10 GB for romancal default. But the datamodel size is O(1 GB) so I should be able to get closer to 2 GB. To do better than that would require not reading the complete dark model into memory and instead removing it plane-by-plane or something, which is certainly an option. I'll poke a bit further tomorrow, but I wanted other people to be able to see what damage I'm doing.

I re-implemented dark subtraction because I didn't want to have to visit stcal's dark subtraction and there really isn't a lot going on there. I have some ridiculous bits like out_data = input_model that are guaranteed to cause confusion but more closely follow the original flow; were we to actually want to go in this direction I would clean that up. Some tests needed to change to reflect that now the steps operate in place rather than making copies (due to my change to roman_datamodels spacetelescope/roman_datamodels#232).

Checklist

added entry in CHANGES.rst under the corresponding subsection
updated relevant tests
updated relevant documentation
updated relevant milestone(s)
added relevant label(s)

for more information, see https://pre-commit.ci

braingram · 2023-07-06T13:59:02Z

romancal/linearity/linearity_step.py

@@ -42,14 +42,16 @@ def process(self, input):

            # copy poly coeffs from linearity model so Nan's can be updated
            lin_coeffs = lin_model.coeffs.copy()
+            # I feel like we should be able to do this in place, but I can't figure
+            # out how.  asdf internal arrays want to be immutable.


I'm fairly certain this is because line 41 opened the LinearityRefModel as read only and memory mapped (I'm running the test locally now but it's very slow).

If this is the case, the write protections will prevent the array from being modified.

I don't believe any kwargs passed to the model (via init when called with a filename) make it to asdf:
https://github.com/spacetelescope/roman_datamodels/blob/c0a084d990f5904f577d9c6c29391fcb44718112/src/roman_datamodels/datamodels/_core.py#L172
so asdf will use the defaults (read only, copy_arrays=False).

laugh, I was just thinking about your talk and copy_arrays=True, which indeed makes this work in place without a copy. I haven't decided yet if that's actually a win.

Excellent!

Let me know if there's anything I can do to help. My attempts to run the regtest locally (on the vpn) failed with permission errors for `/grp/crds/cache/mappings/roman' (after running for a while). I'm not sure what's going on there so it's good to hear you were able to test out the change.

…place.

…between backing file and in-memory datamodel.

for more information, see https://pre-commit.ci

…into reduce-copies

codecov · 2023-07-12T14:48:37Z

Codecov Report

Patch coverage: 26.08% and project coverage change: -0.63 ⚠️

Comparison is base (f7eb942) 76.90% compared to head (df626a0) 76.28%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #774      +/-   ##
==========================================
- Coverage   76.90%   76.28%   -0.63%     
==========================================
  Files          90       90              
  Lines        5465     5418      -47     
==========================================
- Hits         4203     4133      -70     
- Misses       1262     1285      +23

Flag	Coverage Δ		*Carryforward flag
nightly	`64.15% <ø> (-0.56%)`	⬇️	Carriedforward from 3592cfe

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files	Coverage Δ
romancal/dark_current/dark_current_step.py	`53.33% <0.00%> (-35.01%)`	⬇️
romancal/dq_init/dq_init_step.py	`70.37% <0.00%> (-8.21%)`	⬇️
romancal/flatfield/flat_field.py	`84.09% <0.00%> (-4.80%)`	⬇️
romancal/jump/jump_step.py	`94.73% <0.00%> (-1.32%)`	⬇️
romancal/linearity/linearity_step.py	`58.33% <0.00%> (-21.16%)`	⬇️
romancal/photom/photom_step.py	`77.77% <0.00%> (+0.50%)`	⬆️
romancal/ramp_fitting/ramp_fit_step.py	`96.20% <0.00%> (-1.27%)`	⬇️
romancal/saturation/saturation_step.py	`66.66% <0.00%> (-3.71%)`	⬇️
romancal/assign_wcs/assign_wcs_step.py	`85.00% <50.00%> (-1.25%)`	⬇️
romancal/flatfield/flat_field_step.py	`86.66% <50.00%> (-3.34%)`	⬇️
... and 4 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

schlafly · 2023-07-12T17:37:34Z

Okay, I'm feeling better about this PR. Regression tests are passing, here:
https://plwishmaster.stsci.edu:8081/job/RT/job/Roman-Developers-Pull-Requests/281/

The overall goal here is to make the pipeline modify everything in place and pass to the extent possible a single datamodel through that gets updated until it reaches its final form and is returned. Most of the changes in this PR are related to that goal. The new version uses 2 GB through dark subtraction on my test file, while the old version used 10 GB. With planned updates to ramp fitting and jump detection I do not expect those steps to add meaningfully to the total memory usage. However, I haven't touched those steps yet and they currently use more memory than the earlier steps I have addressed.

There are some issues with the stpipe design and the desire to have the steps be able to process either file names or data models. In the first case, we need to open the file name, and then are responsible for closing the file when we're done with it. If we use features like lazy loading and memory mapping, returning an asdf object with a closed file causes problems. So I've changed things so that steps turn lazy loading off, essentially detaching the datamodel objects from their backing files and allowing the files to be closed and the models returned. This is kind of sad---lazy loading makes sense in the pipeline, as does memory mapping. But the previous approach was to make a deep copy of each object, which also ended up removing the benefits of these features. I think if we want to do better here we need to change the pipeline so that it only ever deals with open datamodels and callers are responsible for opening and closing files (following @WilliamJamieson's suggestions). Then we would remove the context management on the input objects from the pipeline.

In the current implementation I've kept the context managers to keep changes to a minimum. It's still the case that if someone sends us an open datamodel with a backing file, it would be bad form for us to close it. The associated PR spacetelescope/roman_datamodels#232 does a shallow copy when opening an already open datamodel that prevents exiting the context manager for the proxy copy from closing the open file. It feels surprising and un-idiomatic that opening an already open file returns a shallow copy of that file, and we're doing work to use a context manager when we don't have any reason to use one for this case. But doing better there seems to me like it would require some redesign of the pipeline / stpipe to only work from datamodels and not from files.

@braingram notes that lazy loading is not available for files with quantities. This is unfortunate. This means that blocks like this spacetelescope/stpipe#97 end up reading the whole file when passed a file as an argument. If possible, we should try to re-enable that behavior. However, in the pipeline steps we end up turning lazy loading off anyway since we want to return full, functional, detached objects.

There are warnings in the regression tests related to files not being closed in the regression tests. Those are also present in main. They stem from this issue spacetelescope/stpipe#97 and are ultimately unrelated to this PR.

There are minor unrelated changes to group_time and the unit test here https://github.com/spacetelescope/romancal/pull/774/files#diff-0ba58d845e38535dbb7496d66fccd7af20bdb15d3c5baa4fda2f863c90d99ea1R48 which I needed to pass tests. I found that this was the root cause of the error in https://github.com/spacetelescope/romancal/pull/725/files .

ddavis-stsci · 2023-07-19T18:53:57Z

I'm not sure I'm seeing much improvement in the memory usage. I'm attaching the plot using your PR branch
git status
On branch reduce-copies

WilliamJamieson · 2023-07-19T19:17:45Z

Can you provide the code you are using to generate this plot?

schlafly · 2023-07-19T22:03:56Z

I'll look more at this tomorrow, but can you highlight the beginning of the jump detection step here? jump detection and ramp fitting are still very inefficient and are by far the high water mark. But with our planned changes to these and this PR those will go down too. My claims about memory usage focused on three point immediately before jump detection.

ddavis-stsci · 2023-07-20T11:55:29Z

The jump step starts at about the 50s mark. If you look at https://innerspace.stsci.edu/pages/viewpage.action?spaceKey=SCSB&title=RCAL+pipeline+Timing+and+Memory you can see that the early steps use 2-4x less memory which is a significant improvement. I assume that the ramp fitting is still the old version for this PR and you only refer to the improved version in romanisim? Do you have a number for the romanisim ramp fitting memory usage?

…

On 7/19/23 6:04 PM, Eddie Schlafly wrote: I'll look more at this tomorrow, but can you highlight the beginning of the jump detection step here? jump detection and ramp fitting are still very inefficient and are by far the high water mark. But with our planned changes to these and this PR those will go down too. My claims about memory usage focused on three point immediately before jump detection. — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/spacetelescope/romancal/pull/774*issuecomment-1642817033__;Iw!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!3vHzQqWQXLbVfIKWWnu99yt-bHcqjCBA4HeRaU_CELNM5bPqFLL9LprXaPCzTtXiYmer_GEcmCfxeUEZyLPgKGwF$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ALXCXWIPFXWDDHWCN3PJE7LXRBKVNANCNFSM6AAAAAAZ7PSG64__;!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!3vHzQqWQXLbVfIKWWnu99yt-bHcqjCBA4HeRaU_CELNM5bPqFLL9LprXaPCzTtXiYmer_GEcmCfxeUEZyJvPQwxo$>. You are receiving this because your review was requested.Message ID: ***@***.***>

stscieisenhamer

LGTM in principle. The open file issue seems to be handled different in roman than jwst; jwst does a reference count method which mitigates many of the issues, if I am reading all this right.

Also note that I agree with the thoughts on future modification of stpipe, but such mods may have impact on jwst also.

schlafly · 2023-07-20T15:02:14Z

Jonathan has the new version of ramp fitting (related to the simulator version) in his stcal / romancal PRs. That code doesn't make any copies of the ramp structure. It does need to make an ImageModel which obviously has a few 4k x 4k arrays of floats. Probably a GB or two. Jump detection will be in the same boat.

…nstead of special PR version.

schlafly · 2023-07-21T21:05:36Z

@ddavis-stsci , can you approve? It looks like I can't approve my own PR. Thanks!

schlafly and others added 2 commits July 5, 2023 16:59

Try to avoid copying data models.

f4a3a82

[pre-commit.ci] auto fixes from pre-commit.com hooks

b2a8e4a

for more information, see https://pre-commit.ci

github-actions bot added Dark Current ramp_fitting testing linearity dq_init labels Jul 5, 2023

braingram reviewed Jul 6, 2023

View reviewed changes

schlafly added 2 commits July 11, 2023 13:43

Get rid of context managers throughout steps. Make all steps work in …

37b3573

…place.

ruff.

5bf8522

github-actions bot added Photom jump Saturation pipeline assign_wcs regression_testing Wide Field Instrument (WFI) flatfield labels Jul 11, 2023

schlafly and others added 5 commits July 12, 2023 10:36

Switch to lazy_load=False by default in romancal steps to break link …

7b0451e

…between backing file and in-memory datamodel.

[pre-commit.ci] auto fixes from pre-commit.com hooks

a9eda19

for more information, see https://pre-commit.ci

Add changelog entry.

85d66c4

Merge branch 'reduce-copies' of https://github.com/schlafly/romancal …

2ef3912

…into reduce-copies

Change dependency to point to updated roman_datamodels.

6094431

github-actions bot added the dependencies Pull requests that update a dependency file label Jul 12, 2023

schlafly added 2 commits July 12, 2023 10:56

Merg.

fdb2c19

Fix ramp fit unit test.

dde416f

schlafly marked this pull request as ready for review July 12, 2023 17:38

schlafly requested a review from a team as a code owner July 12, 2023 17:38

schlafly mentioned this pull request Jul 12, 2023

Do not copy datamodels when opening an already open datamodel. spacetelescope/roman_datamodels#232

Merged

stscijgbot-rstdms mentioned this pull request Jul 13, 2023

romancal memory profiling #752

Closed

stscieisenhamer self-requested a review July 19, 2023 21:38

stscieisenhamer reviewed Jul 20, 2023

View reviewed changes

schlafly added 2 commits July 21, 2023 11:57

Merge updates from main.

2f23d20

Point roman_datamodels back to spacetelescope/roman_datamodels@main i…

3592cfe

…nstead of special PR version.

github-actions bot removed the dependencies Pull requests that update a dependency file label Jul 21, 2023

ddavis-stsci approved these changes Jul 24, 2023

View reviewed changes

Merge changes from main.

df626a0

schlafly merged commit 9c31f0d into spacetelescope:main Jul 24, 2023

schlafly deleted the reduce-copies branch July 24, 2023 13:46

schlafly mentioned this pull request Jul 31, 2024

Investigate need for lazy_loading asdf files #1341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to avoid copying data models. #774

Try to avoid copying data models. #774

schlafly commented Jul 5, 2023

braingram Jul 6, 2023

schlafly Jul 6, 2023

braingram Jul 6, 2023

codecov bot commented Jul 12, 2023 •

edited

Loading

schlafly commented Jul 12, 2023

ddavis-stsci commented Jul 19, 2023

WilliamJamieson commented Jul 19, 2023

schlafly commented Jul 19, 2023

ddavis-stsci commented Jul 20, 2023 via email

stscieisenhamer left a comment

schlafly commented Jul 20, 2023

schlafly commented Jul 21, 2023

Try to avoid copying data models. #774

Try to avoid copying data models. #774

Conversation

schlafly commented Jul 5, 2023

braingram Jul 6, 2023

Choose a reason for hiding this comment

schlafly Jul 6, 2023

Choose a reason for hiding this comment

braingram Jul 6, 2023

Choose a reason for hiding this comment

codecov bot commented Jul 12, 2023 • edited Loading

Codecov Report

schlafly commented Jul 12, 2023

ddavis-stsci commented Jul 19, 2023

WilliamJamieson commented Jul 19, 2023

schlafly commented Jul 19, 2023

ddavis-stsci commented Jul 20, 2023 via email

stscieisenhamer left a comment

Choose a reason for hiding this comment

schlafly commented Jul 20, 2023

schlafly commented Jul 21, 2023

codecov bot commented Jul 12, 2023 •

edited

Loading