Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to avoid copying data models. #774

Merged
merged 14 commits into from
Jul 24, 2023
Merged

Conversation

schlafly
Copy link
Collaborator

@schlafly schlafly commented Jul 5, 2023

This PR tries to reduce memory usage in romancal. My approach has been pretty aggressive so far and should be reined in. I have focused on the steps preceding jump detection and ramp fitting, which we are replacing. See
spacetelescope/romanisim#64
for a memory efficient ramp fitting approach. A jump detection approach following plans for Roman based around that routine should also be very memory efficient.

I have been using this is conjunction with
spacetelescope/roman_datamodels#232
to reduce memory usage. On my memray runs today, up until jump detection the new version uses 3.5 GB in comparison to 10 GB for romancal default. But the datamodel size is O(1 GB) so I should be able to get closer to 2 GB. To do better than that would require not reading the complete dark model into memory and instead removing it plane-by-plane or something, which is certainly an option. I'll poke a bit further tomorrow, but I wanted other people to be able to see what damage I'm doing.

I re-implemented dark subtraction because I didn't want to have to visit stcal's dark subtraction and there really isn't a lot going on there. I have some ridiculous bits like out_data = input_model that are guaranteed to cause confusion but more closely follow the original flow; were we to actually want to go in this direction I would clean that up. Some tests needed to change to reflect that now the steps operate in place rather than making copies (due to my change to roman_datamodels spacetelescope/roman_datamodels#232).

Checklist

  • added entry in CHANGES.rst under the corresponding subsection
  • updated relevant tests
  • updated relevant documentation
  • updated relevant milestone(s)
  • added relevant label(s)

@@ -42,14 +42,16 @@ def process(self, input):

# copy poly coeffs from linearity model so Nan's can be updated
lin_coeffs = lin_model.coeffs.copy()
# I feel like we should be able to do this in place, but I can't figure
# out how. asdf internal arrays want to be immutable.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly certain this is because line 41 opened the LinearityRefModel as read only and memory mapped (I'm running the test locally now but it's very slow).

If this is the case, the write protections will prevent the array from being modified.

I don't believe any kwargs passed to the model (via init when called with a filename) make it to asdf:
https://github.com/spacetelescope/roman_datamodels/blob/c0a084d990f5904f577d9c6c29391fcb44718112/src/roman_datamodels/datamodels/_core.py#L172
so asdf will use the defaults (read only, copy_arrays=False).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

laugh, I was just thinking about your talk and copy_arrays=True, which indeed makes this work in place without a copy. I haven't decided yet if that's actually a win.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

Let me know if there's anything I can do to help. My attempts to run the regtest locally (on the vpn) failed with permission errors for `/grp/crds/cache/mappings/roman' (after running for a while). I'm not sure what's going on there so it's good to hear you were able to test out the change.

@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Jul 12, 2023
@codecov
Copy link

codecov bot commented Jul 12, 2023

Codecov Report

Patch coverage: 26.08% and project coverage change: -0.63 ⚠️

Comparison is base (f7eb942) 76.90% compared to head (df626a0) 76.28%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #774      +/-   ##
==========================================
- Coverage   76.90%   76.28%   -0.63%     
==========================================
  Files          90       90              
  Lines        5465     5418      -47     
==========================================
- Hits         4203     4133      -70     
- Misses       1262     1285      +23     
Flag Coverage Δ *Carryforward flag
nightly 64.15% <ø> (-0.56%) ⬇️ Carriedforward from 3592cfe

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files Coverage Δ
romancal/dark_current/dark_current_step.py 53.33% <0.00%> (-35.01%) ⬇️
romancal/dq_init/dq_init_step.py 70.37% <0.00%> (-8.21%) ⬇️
romancal/flatfield/flat_field.py 84.09% <0.00%> (-4.80%) ⬇️
romancal/jump/jump_step.py 94.73% <0.00%> (-1.32%) ⬇️
romancal/linearity/linearity_step.py 58.33% <0.00%> (-21.16%) ⬇️
romancal/photom/photom_step.py 77.77% <0.00%> (+0.50%) ⬆️
romancal/ramp_fitting/ramp_fit_step.py 96.20% <0.00%> (-1.27%) ⬇️
romancal/saturation/saturation_step.py 66.66% <0.00%> (-3.71%) ⬇️
romancal/assign_wcs/assign_wcs_step.py 85.00% <50.00%> (-1.25%) ⬇️
romancal/flatfield/flat_field_step.py 86.66% <50.00%> (-3.34%) ⬇️
... and 4 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@schlafly
Copy link
Collaborator Author

Okay, I'm feeling better about this PR. Regression tests are passing, here:
https://plwishmaster.stsci.edu:8081/job/RT/job/Roman-Developers-Pull-Requests/281/

The overall goal here is to make the pipeline modify everything in place and pass to the extent possible a single datamodel through that gets updated until it reaches its final form and is returned. Most of the changes in this PR are related to that goal. The new version uses 2 GB through dark subtraction on my test file, while the old version used 10 GB. With planned updates to ramp fitting and jump detection I do not expect those steps to add meaningfully to the total memory usage. However, I haven't touched those steps yet and they currently use more memory than the earlier steps I have addressed.

There are some issues with the stpipe design and the desire to have the steps be able to process either file names or data models. In the first case, we need to open the file name, and then are responsible for closing the file when we're done with it. If we use features like lazy loading and memory mapping, returning an asdf object with a closed file causes problems. So I've changed things so that steps turn lazy loading off, essentially detaching the datamodel objects from their backing files and allowing the files to be closed and the models returned. This is kind of sad---lazy loading makes sense in the pipeline, as does memory mapping. But the previous approach was to make a deep copy of each object, which also ended up removing the benefits of these features. I think if we want to do better here we need to change the pipeline so that it only ever deals with open datamodels and callers are responsible for opening and closing files (following @WilliamJamieson's suggestions). Then we would remove the context management on the input objects from the pipeline.

In the current implementation I've kept the context managers to keep changes to a minimum. It's still the case that if someone sends us an open datamodel with a backing file, it would be bad form for us to close it. The associated PR spacetelescope/roman_datamodels#232 does a shallow copy when opening an already open datamodel that prevents exiting the context manager for the proxy copy from closing the open file. It feels surprising and un-idiomatic that opening an already open file returns a shallow copy of that file, and we're doing work to use a context manager when we don't have any reason to use one for this case. But doing better there seems to me like it would require some redesign of the pipeline / stpipe to only work from datamodels and not from files.

@braingram notes that lazy loading is not available for files with quantities. This is unfortunate. This means that blocks like this spacetelescope/stpipe#97 end up reading the whole file when passed a file as an argument. If possible, we should try to re-enable that behavior. However, in the pipeline steps we end up turning lazy loading off anyway since we want to return full, functional, detached objects.

There are warnings in the regression tests related to files not being closed in the regression tests. Those are also present in main. They stem from this issue spacetelescope/stpipe#97 and are ultimately unrelated to this PR.

There are minor unrelated changes to group_time and the unit test here https://github.com/spacetelescope/romancal/pull/774/files#diff-0ba58d845e38535dbb7496d66fccd7af20bdb15d3c5baa4fda2f863c90d99ea1R48 which I needed to pass tests. I found that this was the root cause of the error in https://github.com/spacetelescope/romancal/pull/725/files .

@schlafly schlafly marked this pull request as ready for review July 12, 2023 17:38
@schlafly schlafly requested a review from a team as a code owner July 12, 2023 17:38
@ddavis-stsci
Copy link
Collaborator

I'm not sure I'm seeing much improvement in the memory usage. I'm attaching the plot using your PR branch
git status
On branch reduce-copies
memprof_pipeline_July19_1

@WilliamJamieson
Copy link
Collaborator

Can you provide the code you are using to generate this plot?

@stscieisenhamer stscieisenhamer self-requested a review July 19, 2023 21:38
@schlafly
Copy link
Collaborator Author

I'll look more at this tomorrow, but can you highlight the beginning of the jump detection step here? jump detection and ramp fitting are still very inefficient and are by far the high water mark. But with our planned changes to these and this PR those will go down too. My claims about memory usage focused on three point immediately before jump detection.

@ddavis-stsci
Copy link
Collaborator

ddavis-stsci commented Jul 20, 2023 via email

Copy link
Collaborator

@stscieisenhamer stscieisenhamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in principle. The open file issue seems to be handled different in roman than jwst; jwst does a reference count method which mitigates many of the issues, if I am reading all this right.

Also note that I agree with the thoughts on future modification of stpipe, but such mods may have impact on jwst also.

@schlafly
Copy link
Collaborator Author

Jonathan has the new version of ramp fitting (related to the simulator version) in his stcal / romancal PRs. That code doesn't make any copies of the ramp structure. It does need to make an ImageModel which obviously has a few 4k x 4k arrays of floats. Probably a GB or two. Jump detection will be in the same boat.

@github-actions github-actions bot removed the dependencies Pull requests that update a dependency file label Jul 21, 2023
@schlafly
Copy link
Collaborator Author

@ddavis-stsci , can you approve? It looks like I can't approve my own PR. Thanks!

@schlafly schlafly merged commit 9c31f0d into spacetelescope:main Jul 24, 2023
@schlafly schlafly deleted the reduce-copies branch July 24, 2023 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants