Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not copy datamodels when opening an already open datamodel. #232

Merged
merged 15 commits into from
Jul 21, 2023

Conversation

schlafly
Copy link
Collaborator

@schlafly schlafly commented Jul 5, 2023

This PR starts to try to reduce memory usage in romancal by changing the default behavior of rdm.open(datamodel) to just return the existing datamodel rather than to make a copy. This likely has important downstream consequences that I haven't fully sussed out, though!

@codecov
Copy link

codecov bot commented Jul 5, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.13 🎉

Comparison is base (4e6db1e) 96.74% compared to head (2ec418c) 96.88%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #232      +/-   ##
==========================================
+ Coverage   96.74%   96.88%   +0.13%     
==========================================
  Files          28       29       +1     
  Lines        2399     2406       +7     
==========================================
+ Hits         2321     2331      +10     
+ Misses         78       75       -3     
Impacted Files Coverage Δ
src/roman_datamodels/datamodels/_core.py 91.57% <100.00%> (+1.57%) ⬆️
src/roman_datamodels/datamodels/_utils.py 93.75% <100.00%> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@WilliamJamieson
Copy link
Collaborator

WilliamJamieson commented Jul 5, 2023

We should be a bit careful here, the changes currently fail this check:

# It's essential that we get a new instance so that the original
# model can be closed without impacting the new model.
assert reopened_model is not original_model

The note is absolutely correct, in that if we don't do the copy being removed, then calling close on some upstream reference to the model object here will cause downstream problems because then inadvertently one can end up trying to access information from a closed model. Indeed, this behavior stretches all the way back to PR #1.

Currently, we are a bit caviler about sometimes calling close and sometimes not calling close on models. Especially since rdm_open is used extensively as a context manager. This cavalierness may cause a couple of issues for us:

  1. We accidentally close a model which was "opened" and then modified by a step when the step exits. This is east to cause when using rdm_open as a context manager. More generally this can be caused via directly calling close somewhere on the model (which will be annoying to debug).
  2. We leave the backing files to a model open and loose reference to those files due to loosing reference to the model, which can cause all sorts of subtle problems. In theory, the __del__ method on the model should clean up the open files for us as it closes the model when the reference count drops to zero, but as the python documentation for that method indicates __del__ is not guaranteed to always be called; moreover, the data model documentation
    states:

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

In effect this means we cannot guarantee that our models will be properly cleaned up by __del__. Indeed, that is what was causing #150 (see #231 fix).

These two issues are both solved by the copy being removed here because it disconnects models "opened" during a given step from models passed in. So we both never accidentally close a model we need open, and can properly close the files for models when needed.

I personally agree with the changes here as it is silly to copy things for no real reason. However, we need to approach this very carefully or otherwise we might cause ourselves very difficult to solve bugs.

@schlafly
Copy link
Collaborator Author

schlafly commented Jul 6, 2023

Yes. The way I think of this, if we open a file, we're responsible for cleaning it up. If we accept an object, the person who made the object is responsible for cleaning it up (or we wait for it to be garbage collected).

The current pipeline practice of using
with rdm.open(in) as data
as a generic "open a file or ~do nothing" seems problematic to me. In the former case we often want to clean up files. In the latter case we don't really want to do anything, but we don't want to clean up the object, since it was sent to us by someone else. We currently have a uniform interface by making a copy and then cleaning that copy up (unless opening with the target argument, which I doubt we handle well?), so that the caller's copy isn't cleaned up. But that's an unacceptable overhead here.

Maybe a better solution would be for rdm.open(in) to return a clone of in. That looks like it's by default a shallow copy that will ~turn off context handling and might be exactly what we want?

@schlafly schlafly marked this pull request as ready for review July 12, 2023 17:39
@schlafly schlafly requested review from a team and WilliamJamieson as code owners July 12, 2023 17:39
@schlafly
Copy link
Collaborator Author

I think this is ready for review. This changes the behavior of rdm.open(...) to return a shallow rather than full copy. rdm.open(...) is used throughout romancal and stcal to open already opened data models, and the previous behavior of making copies for this purpose wasted a lot of memory. This has some implications in terms of what we're allowed to return from the pipeline, which leads to me making lazy_loading the default in the romancal pipeline. That's discussed in more detail in the related romancal PR here: spacetelescope/romancal#774

@PaulHuwe
Copy link
Collaborator

Can you add a plwishmaster link (presumably with the RCAL-774 ticket) to show this passing regression tests?

@schlafly
Copy link
Collaborator Author

romancal regression tests with this PR here: https://plwishmaster.stsci.edu:8081/job/RT/job/Roman-Developers-Pull-Requests/281/
I don't think I ran separate datamodels regression tests and would have to look at this again next week.

@schlafly
Copy link
Collaborator Author

(and note that I don't remember but that I don't think regression tests on romancal would pass with only this PR. One needs the other work on romancal too.)

@ddavis-stsci
Copy link
Collaborator

ddavis-stsci commented Jul 17, 2023 via email

@schlafly
Copy link
Collaborator Author

I have two PRs. This one on roman_datamodels and spacetelescope/romancal#774 on romancal. Both are required. I was worried by Paul's questions that I had missed some roman_datamodels specific regression tests, but it looks like there are only the romancal regression tests, for which I have linked a run here:
https://plwishmaster.stsci.edu:8081/job/RT/job/Roman-Developers-Pull-Requests/281/

@ddavis-stsci
Copy link
Collaborator

ddavis-stsci commented Jul 17, 2023 via email

@schlafly
Copy link
Collaborator Author

I did this in
https://plwishmaster.stsci.edu:8081/job/RT/job/Roman-Developers-Pull-Requests/281/
by making romancal depend on this branch of roman_datamodels.
https://github.com/spacetelescope/romancal/pull/774/files#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R28
That should be adequate? I agree we'd want to take that out after merging the datamodels PR.

@ddavis-stsci
Copy link
Collaborator

ddavis-stsci commented Jul 17, 2023 via email

Copy link
Collaborator

@PaulHuwe PaulHuwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@schlafly
Copy link
Collaborator Author

Merged in updates from main, reran regression tests; merging.
https://plwishmaster.stsci.edu:8081/job/RT/job/Roman-Developers-Pull-Requests/285/

@schlafly schlafly merged commit 357a364 into spacetelescope:main Jul 21, 2023
@schlafly schlafly deleted the reduce-copies branch July 21, 2023 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants