-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to avoid copying data models. #774
Conversation
for more information, see https://pre-commit.ci
romancal/linearity/linearity_step.py
Outdated
@@ -42,14 +42,16 @@ def process(self, input): | |||
|
|||
# copy poly coeffs from linearity model so Nan's can be updated | |||
lin_coeffs = lin_model.coeffs.copy() | |||
# I feel like we should be able to do this in place, but I can't figure | |||
# out how. asdf internal arrays want to be immutable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly certain this is because line 41 opened the LinearityRefModel as read only and memory mapped (I'm running the test locally now but it's very slow).
If this is the case, the write protections will prevent the array from being modified.
I don't believe any kwargs
passed to the model (via init when called with a filename) make it to asdf:
https://github.com/spacetelescope/roman_datamodels/blob/c0a084d990f5904f577d9c6c29391fcb44718112/src/roman_datamodels/datamodels/_core.py#L172
so asdf will use the defaults (read only, copy_arrays=False).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
laugh, I was just thinking about your talk and copy_arrays=True, which indeed makes this work in place without a copy. I haven't decided yet if that's actually a win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
Let me know if there's anything I can do to help. My attempts to run the regtest locally (on the vpn) failed with permission errors for `/grp/crds/cache/mappings/roman' (after running for a while). I'm not sure what's going on there so it's good to hear you were able to test out the change.
…between backing file and in-memory datamodel.
for more information, see https://pre-commit.ci
…into reduce-copies
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #774 +/- ##
==========================================
- Coverage 76.90% 76.28% -0.63%
==========================================
Files 90 90
Lines 5465 5418 -47
==========================================
- Hits 4203 4133 -70
- Misses 1262 1285 +23
*This pull request uses carry forward flags. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Okay, I'm feeling better about this PR. Regression tests are passing, here: The overall goal here is to make the pipeline modify everything in place and pass to the extent possible a single datamodel through that gets updated until it reaches its final form and is returned. Most of the changes in this PR are related to that goal. The new version uses 2 GB through dark subtraction on my test file, while the old version used 10 GB. With planned updates to ramp fitting and jump detection I do not expect those steps to add meaningfully to the total memory usage. However, I haven't touched those steps yet and they currently use more memory than the earlier steps I have addressed. There are some issues with the stpipe design and the desire to have the steps be able to process either file names or data models. In the first case, we need to open the file name, and then are responsible for closing the file when we're done with it. If we use features like lazy loading and memory mapping, returning an asdf object with a closed file causes problems. So I've changed things so that steps turn lazy loading off, essentially detaching the datamodel objects from their backing files and allowing the files to be closed and the models returned. This is kind of sad---lazy loading makes sense in the pipeline, as does memory mapping. But the previous approach was to make a deep copy of each object, which also ended up removing the benefits of these features. I think if we want to do better here we need to change the pipeline so that it only ever deals with open datamodels and callers are responsible for opening and closing files (following @WilliamJamieson's suggestions). Then we would remove the context management on the input objects from the pipeline. In the current implementation I've kept the context managers to keep changes to a minimum. It's still the case that if someone sends us an open datamodel with a backing file, it would be bad form for us to close it. The associated PR spacetelescope/roman_datamodels#232 does a shallow copy when opening an already open datamodel that prevents exiting the context manager for the proxy copy from closing the open file. It feels surprising and un-idiomatic that opening an already open file returns a shallow copy of that file, and we're doing work to use a context manager when we don't have any reason to use one for this case. But doing better there seems to me like it would require some redesign of the pipeline / stpipe to only work from datamodels and not from files. @braingram notes that lazy loading is not available for files with quantities. This is unfortunate. This means that blocks like this spacetelescope/stpipe#97 end up reading the whole file when passed a file as an argument. If possible, we should try to re-enable that behavior. However, in the pipeline steps we end up turning lazy loading off anyway since we want to return full, functional, detached objects. There are warnings in the regression tests related to files not being closed in the regression tests. Those are also present in main. They stem from this issue spacetelescope/stpipe#97 and are ultimately unrelated to this PR. There are minor unrelated changes to group_time and the unit test here https://github.com/spacetelescope/romancal/pull/774/files#diff-0ba58d845e38535dbb7496d66fccd7af20bdb15d3c5baa4fda2f863c90d99ea1R48 which I needed to pass tests. I found that this was the root cause of the error in https://github.com/spacetelescope/romancal/pull/725/files . |
Can you provide the code you are using to generate this plot? |
I'll look more at this tomorrow, but can you highlight the beginning of the jump detection step here? jump detection and ramp fitting are still very inefficient and are by far the high water mark. But with our planned changes to these and this PR those will go down too. My claims about memory usage focused on three point immediately before jump detection. |
The jump step starts at about the 50s mark. If you look at
https://innerspace.stsci.edu/pages/viewpage.action?spaceKey=SCSB&title=RCAL+pipeline+Timing+and+Memory
you can see that the early steps use 2-4x less memory which is a
significant improvement.
I assume that the ramp fitting is still the old version for this PR and
you only refer to the improved version in romanisim?
Do you have a number for the romanisim ramp fitting memory usage?
…On 7/19/23 6:04 PM, Eddie Schlafly wrote:
I'll look more at this tomorrow, but can you highlight the beginning
of the jump detection step here? jump detection and ramp fitting are
still very inefficient and are by far the high water mark. But with
our planned changes to these and this PR those will go down too. My
claims about memory usage focused on three point immediately before
jump detection.
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/spacetelescope/romancal/pull/774*issuecomment-1642817033__;Iw!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!3vHzQqWQXLbVfIKWWnu99yt-bHcqjCBA4HeRaU_CELNM5bPqFLL9LprXaPCzTtXiYmer_GEcmCfxeUEZyLPgKGwF$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ALXCXWIPFXWDDHWCN3PJE7LXRBKVNANCNFSM6AAAAAAZ7PSG64__;!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!3vHzQqWQXLbVfIKWWnu99yt-bHcqjCBA4HeRaU_CELNM5bPqFLL9LprXaPCzTtXiYmer_GEcmCfxeUEZyJvPQwxo$>.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in principle. The open file issue seems to be handled different in roman than jwst; jwst does a reference count method which mitigates many of the issues, if I am reading all this right.
Also note that I agree with the thoughts on future modification of stpipe
, but such mods may have impact on jwst
also.
Jonathan has the new version of ramp fitting (related to the simulator version) in his stcal / romancal PRs. That code doesn't make any copies of the ramp structure. It does need to make an ImageModel which obviously has a few 4k x 4k arrays of floats. Probably a GB or two. Jump detection will be in the same boat. |
…nstead of special PR version.
@ddavis-stsci , can you approve? It looks like I can't approve my own PR. Thanks! |
This PR tries to reduce memory usage in romancal. My approach has been pretty aggressive so far and should be reined in. I have focused on the steps preceding jump detection and ramp fitting, which we are replacing. See
spacetelescope/romanisim#64
for a memory efficient ramp fitting approach. A jump detection approach following plans for Roman based around that routine should also be very memory efficient.
I have been using this is conjunction with
spacetelescope/roman_datamodels#232
to reduce memory usage. On my memray runs today, up until jump detection the new version uses 3.5 GB in comparison to 10 GB for romancal default. But the datamodel size is O(1 GB) so I should be able to get closer to 2 GB. To do better than that would require not reading the complete dark model into memory and instead removing it plane-by-plane or something, which is certainly an option. I'll poke a bit further tomorrow, but I wanted other people to be able to see what damage I'm doing.
I re-implemented dark subtraction because I didn't want to have to visit stcal's dark subtraction and there really isn't a lot going on there. I have some ridiculous bits like
out_data = input_model
that are guaranteed to cause confusion but more closely follow the original flow; were we to actually want to go in this direction I would clean that up. Some tests needed to change to reflect that now the steps operate in place rather than making copies (due to my change to roman_datamodels spacetelescope/roman_datamodels#232).Checklist
CHANGES.rst
under the corresponding subsection