Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object allocation overallocates by 8 bytes for instances of tuple subclasses #100659

Closed
mdickinson opened this issue Jan 1, 2023 · 4 comments
Closed
Labels
performance Performance or resource usage

Comments

@mdickinson
Copy link
Member

mdickinson commented Jan 1, 2023

Given a tuple subclass defined by class mytuple(tuple): pass, memory allocation for mytuple objects currently goes through this code:

cpython/Objects/typeobject.c

Lines 1292 to 1293 in e83f88a

const size_t size = _PyObject_VAR_SIZE(type, nitems+1);
/* note that we need to add one, for the sentinel */

That +1 in nitems+1 has a couple of consequences for mytuple instances; the first of those is a possible perfomance opportunity, while the second is a minor bug.

  • on average (taking into account adjustments for alignment), we allocate 8 more bytes than necessary for each instance
  • sys.getsizeof consistently under-reports for these instances: the value reported is 8 bytes smaller than the value that was passed to PyObject_Malloc during object allocation

(Note: byte counts above assume a 64-bit platform.)

I did some hacky experimentation and was able to remove the +1 specifically for tuple subclasses with no apparent ill-effects. But the +1 definitely is needed for some varobjects (including PyHeapTypeObject, I think), and there may also be 3rd party extension code that either deliberately or inadvertently relies on it.

I haven't investigated whether there are other varobjects besides tuple subclasses for which the +1 could be removed.

@mdickinson
Copy link
Member Author

mdickinson commented Jan 1, 2023

Here's an ad-hoc reproducer (not exactly a proof of the overallocation, but at least strong evidence), on main. Let's start with a 3-element namedtuple called Point:

Python 3.12.0a3+ (heads/main:d52d4942cf, Jan  1 2023, 14:45:48) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from collections import namedtuple, Counter
>>> Point = namedtuple("Point", "x y z")

sys.getsizeof reports 64 bytes per Point instance, which sounds about right to me: 8 fields of 8 bytes each (2 for gc-tracking, 1 for the type pointer, 1 for the refcount, 1 for the tuple size, 3 for the three item pointers, no __dict__ (managed or otherwise), no __weakref__):

>>> import sys
>>> sys.getsizeof(Point(1, 2, 3))
64
>>> Point.__basicsize__, Point.__itemsize__
(24, 8)

But if we create a large number of Point instances (with shared-integer items, so that we're not also allocating memory for the items), their ids never differ by less than 80:

>>> points = [Point(1, 2, 3) for _ in range(10**6)]
>>> ids = sorted(map(id, points))
>>> Counter(id1 - id0 for id1, id0 in zip(ids[1:], ids))
Counter({80: 995084, 144: 4823, 16528: 77, 480: 2, 160: 1, 184848: 1, 880: 1, 960: 1, 1360: 1, 800: 1, 2720: 1, 720: 1, 904640: 1, 207072: 1, 6640: 1, 180368: 1, 82064: 1})

@ionite34
Copy link
Contributor

ionite34 commented Jan 2, 2023

Similar results here for main and 3.11.1, macos M1

namedtuple
sys.getsizeof: 64
[(80, 995094), (144, 4900), (160, 1), (3200, 1), (82064, 1), (136896, 1), (7585936, 1)]

Using eval("(1, 2, 3)") to create a bunch of new tuples shows the expected 64 byte allocation

tuple
sys.getsizeof: 64
[(64, 996067), (128, 3918), (192, 2), (576, 2), (768, 1), (832, 1), (1024, 1), (1536, 1), (6144, 1), (7808, 1), (16832, 1), (49280, 1), (82048, 1), (7585920, 1)]

Subclasses of int and dict do not appear to be affected

class UserInt(int):
    __slots__ = ()
   
UserInt(10**50)
user-int
sys.getsizeof: 64
[(64, 996055), (128, 3924), (192, 3), (256, 1), (320, 1), (384, 1), (448, 2), (896, 1), (1088, 1), (1344, 1), (1664, 1), (2240, 1), (2880, 1), (7360, 1), (16512, 1), (51264, 1), (66304, 1), (1127040, 1), (7700608, 1)]
class UserDict(dict):
    __slots__ = ()
    
UserDict(x=10)
user-dict
basicsize: 48
basicsize (+ gc): 64
[(64, 996055), (128, 8), (192, 1), (256, 1), (320, 1), (448, 2), (640, 1), (896, 1), (1664, 1), (1728, 1), (2240, 1), (3328, 1), (7360, 1), (32896, 3888), (49280, 32), (51264, 1), (66304, 1), (1127040, 1), (7733376, 1)]

@corona10
Copy link
Member

#81381

This issue looks relevant, we might need to investigate which case could be reduced.

@mdickinson
Copy link
Member Author

@corona10 Thanks; that looks like the exact same issue. I'll close this as a duplicate.

@mdickinson mdickinson closed this as not planned Won't fix, can't repro, duplicate, stale Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

3 participants