Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 214 lru cache mem leak #240

Merged

Conversation

maxschloegel
Copy link
Contributor

In this MR we refactor all functools.lru_cache decorators on instance methods, since they introduce memory-leak.

Currently the only class that still contains lru_cache-decorators for instance methods is DCORTraceItem in dclab.rtdc_dataset/fmt_dcor/events.py. There are going to be future refactoring steps that will remove these.

@codecov-commenter
Copy link

codecov-commenter commented Oct 25, 2023

Codecov Report

Attention: 48 lines in your changes are missing coverage. Please review.

Comparison is base (8eb38a0) 75.54% compared to head (bd8c699) 73.01%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #240      +/-   ##
==========================================
- Coverage   75.54%   73.01%   -2.53%     
==========================================
  Files         109      109              
  Lines        7586     7598      +12     
==========================================
- Hits         5731     5548     -183     
- Misses       1855     2050     +195     
Files Coverage Δ
dclab/rtdc_dataset/core.py 89.23% <100.00%> (+1.59%) ⬆️
dclab/rtdc_dataset/fmt_hdf5/base.py 92.00% <ø> (+0.41%) ⬆️
dclab/rtdc_dataset/fmt_hdf5/events.py 98.66% <100.00%> (+0.69%) ⬆️
dclab/rtdc_dataset/fmt_hdf5/logs.py 100.00% <100.00%> (ø)
dclab/rtdc_dataset/fmt_hdf5/tables.py 95.65% <100.00%> (+0.19%) ⬆️
dclab/rtdc_dataset/fmt_hierarchy.py 98.03% <100.00%> (+<0.01%) ⬆️
dclab/rtdc_dataset/fmt_dcor/logs.py 47.05% <0.00%> (-15.45%) ⬇️
dclab/rtdc_dataset/fmt_tdms/event_image.py 0.00% <0.00%> (ø)
dclab/rtdc_dataset/fmt_dcor/tables.py 34.61% <0.00%> (-9.39%) ⬇️
dclab/rtdc_dataset/fmt_tdms/event_contour.py 0.00% <0.00%> (ø)
... and 1 more

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maxschloegel
Copy link
Contributor Author

The errors thrown above are the DCOR-issues you, @paulmueller , were talking about, right?

Copy link
Member

@paulmueller paulmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor changes ❤️

return length
# Try to get the length from the feature sizes
keys = list(self._events.keys())
keys = list(self._events.keys()) + self.features_basin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a separate case for self.features_basin here?
I fear that accessing this property might take very long and might not be required.

i.e.

keys = list(self._events.keys())
for kk in keys:
    ...

keysb = self.features_basin
if keysb:
    return len(self[keysb][0])


raise ValueError...

The iteration over the self._events is just a precaution due to some internal features not implementing len properly. But for basin features this should not happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a dataset is created without any own features, but only referencing features of its basin, then there are no key-value pairs in self._events, so this would return a length of 0, despite the basin having multiple entries.
So only if self._events.keys() is empty I need to look at the self.features_basin-list.

So similar to what you suggest I could first check

keys = list(self._events.keys())
if not keys:
    keys = self.features_basin
for kk in keys:
    . . .

Would that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be a good solution 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a one-liner I could also do:

keys = list(self._events.keys()) or self.features_basin

Here the part after or would only be executed if self._events.keys() is empty. I think it looks a bit nicer and shorter.
What do you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the or looks nicer.

CHANGELOG Outdated
@@ -1,4 +1,5 @@
0.54.3
- ref: replace lru_cache on instance methods with cache attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a "fix". Please reference the issue number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@paulmueller
Copy link
Member

Tests failing is not related to this PR.

@paulmueller
Copy link
Member

Looks good to me. Can I merge?

@maxschloegel
Copy link
Contributor Author

Yes!

@paulmueller paulmueller merged commit f160a49 into DC-analysis:master Oct 25, 2023
9 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants