-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C implementation of @implementer
leaks objects
#216
Comments
|
Thanks for the report. I’ll look into it tomorrow unless someone sees the problem first. |
This patch seems to make the problem go away. I'm not really familiar enough with either zope.interfaces or the Python C API to say for certain if it is correct (or complete):
Suffice it to say that |
Thanks, that does indeed look like the problem. @richvdh I'm curious. My group has been running applications with uptimes measured in weeks and hasn't noticed this issue; they heavily use zope.interface/zope.component/zope.site and Pyramid. Would you say your application is a typical user of zope.interface, or is it somewhat unusual in the amount of dynamic specifications it creates? |
…bases Previously they did not, leading to a reference leak of a tuple. Fixes #216
I have a Plone 5.2.2 site using the latest zope.interface, running with some unexpected memory behaviour, growing slowly over several days from initial 400-500MB to 3GB (my limit when it restarts the process). There are that many possible factors what's wrong, it is difficult to tell if this could be the reason. Also, I need to enhance my memory-leak debugging skills... But when fixed I can update the service immediately using a branch and report back (needs some days). |
@jamadden good stuff, seems like an easy fix. Our application is based on top of Twisted, which uses zope.interface internally. I couldn't really say exactly what it is doing that is causing the problem here; a couple of things came to mind:
Again, I'm not sure if those are the actual problem here, because I struggled to reproduce the problem outside our application. But they do look like dynamic behaviour that Zope might not expect, but that all Twisted-based applications are likely to hit. |
@jensens An interesting artifact of our experince was that, because the retained objects were tiny (1- or 2-tuples), the increase in memory usage was barely perceptible. We (or rather, our users) noticed it via increased GC time despite a relatively flat memory consumption. |
Jason Madden wrote at 2020-9-28 03:48 -0700:
...
@richvdh I'm curious. My group has been running applications with uptimes measured in weeks and hasn't noticed this issue; they heavily use zope.interface/zope.component/zope.site and Pyramid. Would you say your application is a typical user of zope.interface, or is it somewhat unusual in the amount of dynamic specifications it creates?
`z3c.form` seems to allow for instance (rather than class/interface) specific
adaptation. I do not know how this is achieved (as adaptation
primarily is based on interface specifications). It might be that
interface specification are dynamically created from instances; in this
case, a high number of specifications would be possible.
|
Thanks everyone, this all makes sense. Given a scenario like this: # Static interface definition, static class definition
class IFoo(Interface):
pass
class Bar(object):
pass
# Something done per-request/connection/etc
bar = Bar()
alsoProvides(bar, IFoo) The The contents of the leaking It's possible our application is subject to this but we've tuned GC to run rarely, accepting the occasional large pause. So we may not have noticed a slow growth in something that was already large. It's also possible our monitoring or metrics alerting just weren't capturing this, I haven't double-checked [/cc @sajones @cutz]. I think #217 should be the fix. If reviewed and accepted I can immediately release the patched version to PyPI. |
In Plone we use z3c.form and other mechanisms to dynamically assign (even per request) interfaces. Also our tuples might be heavier than usual. So, this could be the reason for my observed memory leaks. I started a Plone |
Jens W. Klein wrote at 2020-9-28 12:10 -0700:
In Plone we use z3c.form and other mechanisms to dynamically assign (even per request) interfaces. Also our tuples might be heavier than usual. So, this could be the reason for my observed memory leaks. I started a Plone `buildout coredev` test-run with the fixes. If this goes green I would consider the patch at least w/o side effects. My next check will be using the branch in my live site with the memory leaking problems.
In addition, `alsoProvides` is applied to any request
(to support the "browserlayer" functionality).
|
Indeed, and this together with lineage subsites (additional own browserlayers) and plone.subrequest (Mosaic). That is a lot |
Indeed, each intermediate |
One test fails now. Python 3.7 only (3.6 and 3.8 tests are passing): |
I'm not sure what that report is telling us, the stack trace seems to have left out the actual error. But guessing from Note that setting |
Either I'm not sure what it tells me. Maybe a wrong implementation on Plone side, may be test specific. This is the the pain code, where zope.interface is used in a very dynamic way. But strange, it only appears in Python 3.7? |
Jens W. Klein wrote at 2020-9-30 01:30 -0700:
Either I'm not sure what it tells me. Maybe a wrong implementation on Plone side, mytbe test specific. This is the the pain code, where zope.interface is used in a very dynamic way. But strange, it only appears in Python 3.7?
Maybe, the error is related to some kind of non-determinism
(e.g. a race condition). In this case, you may see
the error only occasionally - and the next test runs might show
3.7 successful and another Python version test may show a problem.
|
Update:
Jenkins uses 3.7.6, so I installed this version on my machine.
Maybe the test are failing only in combination with other tests, as Jenkins on Python 3.7.6 does:
So I have still no clue why, but it is a problem on Python 3.7 only running the tests in combination. |
Forgive my naivety, but this sounds like a problem specific to plone which is being uncovered by this fix in zope.interface. Shouldn't it be tracked in an issue on that repository? |
@richvdh I dont think this is Plone specific. Before the fix all worked w/o any problems. Other projects can be affected as well. |
Full traceback
|
Perhaps adding a |
@jensens fair enough. that does look odd. |
What's really interesting is that it's self that seems to have been collected! def unsubscribe(self, dependent):
try:
n = self._dependents[dependent] # Line 389
except TypeError:
raise KeyError(dependent)
def __setBases(self, bases):
# Remove ourselves as a dependent of our old bases
for b in self.__bases__:
b.unsubscribe(self) # Line 402 So Nominally, self could not have been collected while we're running methods of self and passing self as an argument... |
This is complicated by the fact that many functions in the Interfaces are hashed based on just their name and module. So every one of these local If one of these It's highly dependent on when GC runs, and assumes the existence of a cycle and that some base object persists between test functions so that GC can clear out the living entry. I'm working on a simple reproducer, but I suspect that the problem could be solved by either (a) adding Why is this issue showing up only now? Before the PR, the contents of |
Ok, this demonstrates it pretty easily: import gc
from zope.interface import Interface
class IRoot(Interface):
pass
class IBlank(Interface):
pass
IBlank.__bases__ = (IRoot,)
# IRoot now has IBlank in its _dependents
other_ref = IBlank # Keep alive
class IBlank(Interface): # Replace the original IBlank; this will hash identically
pass
# Register the new IBlank as a dependent of IRoot.
IBlank.__bases__ = (IRoot,)
# Really, there should be two
print(list(IRoot._dependents.keys())) # -> [IBlank]
# Delete the original IBlank, which should remove it
# from _dependents. It takes a gc.collect() to do this,
# so there's a cycle somewhere.
del other_ref
gc.collect()
print(list(IRoot._dependents.keys())) # -> []
IBlank.__bases__ = (IRoot,) # KeyError $ python foo.py
[<InterfaceClass __main__.IBlank>]
[]
Traceback (most recent call last):
File "/tmp/foo.py", line 29, in <module>
IBlank.__bases__ = (IRoot,)
File "//src/zope/interface/interface.py", line 402, in __setBases
b.unsubscribe(self)
File "//src/zope/interface/interface.py", line 389, in unsubscribe
n = self._dependents[dependent]
File "//lib/python3.8/weakref.py", line 383, in __getitem__
return self.data[ref(key)]
KeyError: <weakref at 0x10ce7ed10; to 'InterfaceClass' at 0x10ce71dd0 (IBlank)> |
Well, since we hit a very specific edge case, I can grep through our code and rename those occurrences. This is probably in tests only. I would prefer to have it it fixed, but at least it needs a line in the changelog and some lines in the documentation. |
I recall running into this exact problem before, with older versions of zope.interface, so it's not new; it just took a brief hiatus in zope.interface 5 because of this bug. (It was more frequent on PyPy because of its non-refcount GC. I seem to recall adding fixes for that to some public zopefoundation repositories before...) It's for that reason that I learned to be careful about using local interface classes and especially how they're named 😄 |
Given that we can't change equality/sorting/hashing of these objects without major BWC and persistence problems, the only way to fix this with minimum compatibility issues that I've been able to think of is to make That has its own BWC concerns though. All of (I don't fully understand why |
I renamed the interfaces to have unique names in plone/plone.dexterity#135 and tests are passing. |
5.1.1 is on PyPI (it may be awhile before binary wheels get built as Travis is busy.) |
@jamadden @sajones I'm late to the party but I wanted to shed some light on why we may not have noticed this. Our application has had large organic usage growth this year and when we couple that with the large zodb caches we run, we've been attributing any memory growth towards caches filling up. Combine that with what is likely slower growth in our, relatively speaking, large memory footprint application and I think it just went unnoticed. That said, we had been starting to key in on some hot processes that were showing growth which didn't make sense to attribute to ZODB caches given our database sizes. We just hadn't tracked down any details yet. Looking forward to seeing what sort of impact this has. Thanks All. |
Jason Madden wrote at 2020-9-30 04:13 -0700:
This is complicated by the fact that many functions in the `test_fti` module create an interface called `IBlank`.
Interfaces are hashed based on just their name and module. So every one of these local `IBlank` interfaces will hash the same way, and be treated the same for purposes of `_dependents`.
If one of these `IBlank` was involved in a cycle that delayed its collection (and removal from `_dependents`) then later when GC ran, an *active* `IBlank` could be removed from `_dependents`. When we attempt to change the bases and remove `IBlank`, because the dead `IBlank` has just been collected and removed from `_dependents`, we would get this `KeyError`.
Does this not sound like a conceptual bug in the treatment
of `_dependents`?. An application can name its interfaces at it likes
and this should not lead to `KeyError`s.
|
Sort of? But >>> from uuid import UUID
>>> from weakref import WeakKeyDictionary
>>> UUID('{12345678-1234-5678-1234-567812345678}')
UUID('12345678-1234-5678-1234-567812345678')
>>> uuid1 = UUID('{12345678-1234-5678-1234-567812345678}')
>>> uuid2 = UUID('{12345678-1234-5678-1234-567812345678}')
>>> uuid1 == uuid2
True
>>> hash(uuid1) == hash(uuid2)
True
>>> d = WeakKeyDictionary()
>>> d[uuid1] = 42
>>> d[uuid2]
42
>>> del uuid1
>>> d[uuid2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "//python3.8/weakref.py", line 383, in __getitem__
return self.data[ref(key)]
KeyError: <weakref at 0x10ecca470; to 'UUID' at 0x10ec6f0f0> I outlined some possible solutions and their drawbacks above. As rarely as this is actually a problem, I'm not sure it's worth the costs though. It's worth noting that more than just hashing depends on interface names. Pickling is done by name too. No matter where in a module, or how many times, you define >>> import pickle
>>> from zope.interface import Interface
>>> class IFoo(Interface):
... pass
...
>>> s = pickle.dumps(IFoo)
>>> IFoo = 42
>>> pickle.loads(s)
42 |
Jason Madden wrote at 2020-9-30 10:44 -0700:
Sort of? But `zope.interface` is hardly unique in this. Using weakrefs with objects that define non-identity hash functions can easily lead to this problem in many places in an application. Here's an example from the standard library:
```pycon
>>> from uuid import UUID
>>> from weakref import WeakKeyDictionary
>>> UUID('{12345678-1234-5678-1234-567812345678}')
UUID('12345678-1234-5678-1234-567812345678')
>>> uuid1 = UUID('{12345678-1234-5678-1234-567812345678}')
>>> uuid2 = UUID('{12345678-1234-5678-1234-567812345678}')
>>> uuid1 == uuid2
True
>>> hash(uuid1) == hash(uuid2)
True
>>> d = WeakKeyDictionary()
>>> d[uuid1] = 42
>>> d[uuid2]
42
>>> del uuid1
>>> d[uuid2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "//python3.8/weakref.py", line 383, in __getitem__
return self.data[ref(key)]
KeyError: <weakref at 0x10ecca470; to 'UUID' at 0x10ec6f0f0>
```
This is a different case. Here, we have `uuid1 == uuid2`
and therefore, after we deleted "uuid1" (and thereby removed it
implicitly from `d`) we expect a `KeyError` on `d[uuid2]`.
But, two definitions if an interface named *I*, should not be equal
-- strangely enough they are, however. For me, this looks like a bug:
```
>> from zope.interface import Interface
>> class I(Interface):
... def f(): pass
...
>> I1 = I
>> class I(Interface):
... def g(): pass
...
>> I2 = I
>> I1 == I2
True
>> list(I1)
['f']
>> list(I2)
['g']
```
Thus, we have two interfaces defining different methods
and nevertheless, they are considered equal.
|
That's correct and expected. Only One reason for interfaces to be comparable is so that they can be used as BTree keys — identity comparisons don't work in that case — in fact, they must be totally orderable, independent of process or current definition (because definitions change; methods and attributes are added or removed, but the comparison must remain valid). |
Branch: refs/heads/master Date: 2020-09-30T16:45:33+02:00 Author: Jens W. Klein (jensens) <jk@kleinundpartner.at> Commit: plone/plone.dexterity@03da1ef fix to use with fixed zope.interface see zopefoundation/zope.interface#216 (comment) Files changed: A news/135.bugfix M plone/dexterity/tests/test_fti.py Repository: plone.dexterity Branch: refs/heads/master Date: 2020-09-30T21:05:28+02:00 Author: Jens W. Klein (jensens) <jk@kleinundpartner.at> Commit: plone/plone.dexterity@b9385b7 Merge pull request #135 from plone/fix-tests-ziface fix to use with fixed zope.interface Files changed: A news/135.bugfix M plone/dexterity/tests/test_fti.py
Jason Madden wrote at 2020-9-30 12:01 -0700:
That's correct and expected. Only `__name__` and `__module__` are considered in comparisons.
That is a very strange notion of "equality":
How could interfaces be considered equal which define completely
different sets of methods?
It should (at least) be documented with big warning signs as it makes
dynamic interface definitions highly unreliable.
The test failure discussed in the thread is one instance of this
unreliability.
As I wrote earlier, `z3c.form` provides some mechanism to register
adapters based on individual instances (rather than classes and interfaces).
I suspect dynamic interface definitions to be involved in this functionality
and with the above equality this might lead to big surprises.
|
I think a big part of the reasoning goes to persistence. The only sane way to pickle an interface is as a global object — just like that's the only sane way to pickle a class. That's the only way that an instance of a class that This comparison behaviour has been the same all the way back to at least 3.4.0. That early version didn't correctly define Because this has been the behaviour for more than ten years, I suspect that this is all handled correctly by packages doing dynamic interfaces (IIRC, the "menu" support works be creating dynamic interfaces and making them global objects so they can be unpickled; it understands that names matter). It's only rare corner cases like the one in plone.dexterity — which only showed up because of a quirk of the garbage collector on a particular machine — where it becomes a problem. |
Jason Madden wrote at 2020-9-30 21:23 +0000:
I think a big part of the reasoning goes to persistence....
Conceptionally, it is wrong to consider two interfaces equal
if they define different sets of methods -- such interfaces
are obviously not equal (in the sense that they can be used
interchangeably).
Apparently, a specific requirement (maintaining interfaces in
a persistent `BTree`) has caused the adoption of such a strange
equality definition. This might be acceptable provided it is
well documented. Up to now, I have not yet consiously seen this
documentation.
|
I believe that the correct perspective is that conceptually, an interface is a global object. By definition, there is only one global object at a time that can be referred to by Trying to define two global objects with the same module and name and use them at the same time as if they were distinct is conceptually quite wrong in this framework. Just as you can't define two At any rate, I feel like I've veered very far off what this issue was about and so I should stop annoying those getting notifications for this issue with unrelated chatter. |
Jason Madden wrote at 2020-9-30 15:36 -0700:
I believe that the correct perspective is that conceptually, an interface is a global object.
Really? Are you really using interfaces based on their property to
be "global object"s? I do not: I use them based on
what methods (with what signatures)
and attributes they define. For me, the "global object" property is at most
secondary.
...
At any rate, I feel like I've veered very far off what this issue was about and so I should stop annoying those getting notifications for this issue with unrelated chatter.
I will check whether the documentation clearly states
that the interface semantics is primarily based on being a
"global object" (with the ramification for `equality`)
and if not file a new issue.
|
5.1.2 (2020-10-01) ================== - Make sure to call each invariant only once when validating invariants. Previously, invariants could be called multiple times because when an invariant is defined in an interface, it's found by in all interfaces inheriting from that interface. See `pull request 215 <https://github.com/zopefoundation/zope.interface/pull/215/>`_. 5.1.1 (2020-09-30) ================== - Fix the method definitions of ``IAdapterRegistry.subscribe``, ``subscriptions`` and ``subscribers``. Previously, they all were defined to accept a ``name`` keyword argument, but subscribers have no names and the implementation of that interface did not accept that argument. See `issue 208 <https://github.com/zopefoundation/zope.interface/issues/208>`_. - Fix a potential reference leak in the C optimizations. Previously, applications that dynamically created unique ``Specification`` objects (e.g., used ``@implementer`` on dynamic classes) could notice a growth of small objects over time leading to increased garbage collection times. See `issue 216 <https://github.com/zopefoundation/zope.interface/issues/216>`_. .. caution:: This leak could prevent interfaces used as the bases of other interfaces from being garbage collected. Those interfaces will now be collected. One way in which this would manifest was that ``weakref.ref`` objects (and things built upon them, like ``Weak[Key|Value]Dictionary``) would continue to have access to the original object even if there were no other visible references to Python and the original object *should* have been collected. This could be especially problematic for the ``WeakKeyDictionary`` when combined with dynamic or local (created in the scope of a function) interfaces, since interfaces are hashed based just on their name and module name. See the linked issue for an example of a resulting ``KeyError``. Note that such potential errors are not new, they are just once again a possibility.
It appears that the C implementation of
@implementer
has a reference leak. This leads to a gradual increase in GC time, in a long-running application.The following test script:
produces the following output:
In other words, each invocation of
@implementer
leads to new objects which cannot be garbage-collected.Disabling the C implementations (by moving aside the
_zope_interface_coptimizations.cpython-36m-x86_64-linux-gnu.so
file) resolves the problem, as does downgrading to zope.interface 4.7.2.The text was updated successfully, but these errors were encountered: