Skip to content

Latest commit

 

History

History
214 lines (131 loc) · 7.26 KB

File metadata and controls

214 lines (131 loc) · 7.26 KB

intimage title

Thanks @MambaWong for pointing out the errors #22 of this article

contents

related file

  • cpython/Objects/longobject.c
  • cpython/Include/longobject.h
  • cpython/Include/longintrepr.h

memory layout

memory layout

after python3, there's only type named int, the long type in python2.x is int type in python3.x

the structure of long object looks like the structure of tuple object, obviously, there's only one field to store the real int value, that's ob_digit

But how does CPython represent the variable size int in byte level? Let's see

how is element stored inside

ingeter 0

notice, when the value is 0, the ob_digit field doesn't store anything, the value 0 in ob_size indicate that long object represent integer 0

i = 0

0

ingeter 1

there are two different types of ob_digit depends on your system.

#if PYLONG_BITS_IN_DIGIT == 30
typedef uint32_t digit;
typedef int32_t sdigit;
typedef uint64_t twodigits;
typedef int64_t stwodigits; /* signed variant of twodigits */
#define PyLong_SHIFT    30
#define _PyLong_DECIMAL_SHIFT   9 /* max(e such that 10**e fits in a digit) */
#define _PyLong_DECIMAL_BASE    ((digit)1000000000) /* 10 ** DECIMAL_SHIFT */
#elif PYLONG_BITS_IN_DIGIT == 15
typedef unsigned short digit;
typedef short sdigit; /* signed variant of digit */
typedef unsigned long twodigits;
typedef long stwodigits; /* signed variant of twodigits */
#define PyLong_SHIFT    15
#define _PyLong_DECIMAL_SHIFT   4 /* max(e such that 10**e fits in a digit) */
#define _PyLong_DECIMAL_BASE    ((digit)10000) /* 10 ** DECIMAL_SHIFT */

I've modified the source code to change the value of PYLONG_BITS_IN_DIGIT to 15

but when it's going to represent integer 1, ob_size becomes 1 and field in ob_digit represent the value 1 with type unsigned short

i = 1

1

ingeter -1

when i becomes -1, the only difference from the integer 1 is the value in ob_size field, CPython interpret ob_size as a signed type to differ the positive and negative sign

i = -1

-1

ingeter 1023

the basic unit is type digit, which provide 2 bytes(16bits) for storage. And 1023 takes the rightmost 10 bits, so the value ob_size field is still 1.

1023

ingeter 32767

the integer 32767 represent in the same way as usual

32767

ingeter 32768

CPython don't use all the 16 bits in digit field, the first bit of every digit is reserved, we will see why later

32768

little endian and big endian

notice, because the digit is the smallest unit in the CPython abstract level, The order between bytes inside a single ob_digit is the same as your machine order(mine is little endian)

Order between digit in the ob_digit array are represent as most-significant-digit-in-the-right-most order(little endian order)

we can have a better understanding with the integer value -262143

the minus sign is stored in the ob_size field

the interger 262143(2^18 = 262144) in binary representation is 00000011 11111111 11111111

262143

reserved bit

why the left-most bit in digit is reserved? Why order between digit in the ob_digit array are represented as little-endian?

let's try to add two integer value

i = 1073741824 - 1 # 1 << 30 == 1073741824
j = 1

before_add

k = i + j

first, initialize a temporary PyLongObject with size = max(size(i), size(j)) + 1

temp_add

step1, sum the firt digit in each ob_digit array to a variable named carray

step_1

step2, set the value in temp[0] to (carry & PyLong_MASK)

step_2

step3, right shift the carray up to the leftmost bit

step_3_rshift

step4, add the second digit in each ob_digit array to the result of carray

step_4

step5, set the value in temp[1] to (carry & PyLong_MASK)

step_4

step6, right shift the carray again

step_3_rshift

go to step4 and repeat until no more digit left, set the final carray to the last index of temp

step_final

the variable temp contains the sum, now, you see the reserved bit is used for the carry or borrow when you add/sub an integer, the digit in ob_digit array are stored in little-endian order so that the add/sub operation can process each digit from left to right

the sub operation is similar to the add operation, so you can read the source code directly

k

small ints

CPython also use a buffer pool to store the frequently used integer

#define NSMALLPOSINTS           257
#define NSMALLNEGINTS           5
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

let's see

c = 0
d = 0
e = 0
print(id(c), id(d), id(e)) # 4480940400 4480940400 4480940400
a = -5
b = -5
print(id(a), id(b)) # 4480940240 4480940240
f = 1313131313131313
g = 1313131313131313
print(id(f), id(g)) # 4484604176 4484604016

small_ints