Pickling is slow wrt int's #404

skirpichev · 2023-03-27T06:43:53Z

An example (~4-5x difference):

$ python -m timeit -r10 -s 'from math import factorial as f;from pickle import dumps as d' \
     -s 'a=f(1000)' 'd(a)'
50000 loops, best of 10: 5.7 usec per loop
$ python -m timeit -r10 -s 'from gmpy2 import fac as f, mpz as z;from pickle import dumps as d' \
    -s 'a=f(z(1000))' 'd(a)'
10000 loops, best of 10: 24.4 usec per loop

I wonder if using the pure python function for pickling support could be a significant part of this speed loss.
(Inspired by mpmath/mpmath#667)

The text was updated successfully, but these errors were encountered:

casevh · 2023-03-28T04:41:47Z

I tried three tests.

$ py311 -m timeit -r10 -s 'from gmpy2 import fac as f, mpz as z;from pickle import dumps as d' -s 'a=f(z(1000))' 'd(a)'
50000 loops, best of 10: 4.92 usec per loop
$ py311 -m timeit -r10 -s 'from math import factorial as f;from pickle import dumps as d' -s 'a=f(1000)' 'd(a)'
200000 loops, best of 10: 1.3 usec per loop
$ py311 -m timeit -r10 -s 'from gmpy2 import fac as f, mpz as z, to_binary as d' -s 'a=f(z(1000))' 'd(a)'
100000 loops, best of 10: 3.23 usec per loop

The third test directly calls gmpy2.to_binary which creates the pickled data. to_binary calls mpz_export for all the data manipulation.

The Python code inside gmpy2 is ancient. I would accept removing it since we no longer need to support old versions of Python and mpz is now a real type and not a factory function.

I don't know if it is worth trying to improve on mpz_export/mpz_import? The code in mpz_pylong.c may help.

skirpichev · 2023-05-28T03:40:15Z

BTW, the mpz.to_bytes() (or to_binary) is systematically slower than int.to_bytes(). Here is a simple benchmark:

Code:

# a.py
from gmpy2 import mpz, to_binary
import sys
import random
import time
import platform
import matplotlib.pyplot as plt

int_time = []
mpz_time = []
#bin_time = []

times = 15
nbits = int(float(sys.argv[1]))

random.seed(1)

r = sorted(random.sample(range(1, nbits), 500))

for k in r:
    ns = [random.randint(2**k//2, 2**k) for _ in range(times)]
    nl = [(n, (n.bit_length() + 7)//8 + 2) for n in ns]

    start = time.perf_counter_ns()
    for n, l in nl:
        n.to_bytes(l)
    int_time.append((time.perf_counter_ns() - start) / times)

    nl = [(mpz(n), l) for n, l in nl]
    start = time.perf_counter_ns()
    for n, l in nl:
        n.to_bytes(l)
    mpz_time.append((time.perf_counter_ns() - start) / times)

#   start = time.perf_counter_ns()
#   for n, l in nl:
#       to_binary(n)
#   bin_time.append((time.perf_counter_ns() - start) / times)


fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(r, int_time, label='int.to_bytes()')
ax.plot(r, mpz_time, label='mpz.to_bytes()')
#ax.plot(r, bin_time, label='to_binary(mpz)')
ax.set_yscale('log')
ax.set_xlabel('bits')
ax.set_ylabel('time (ns)')
ax.legend()
plt.title('Benchmark for to_bytes with ' + str(nbits) + ' bits.')
plt.show()
fig.savefig('to_bytes-'+str(nbits) + '.png')

UPD:

$ python
Python 3.11.3+ (heads/3.11:9fbb614c4e, Apr 29 2023, 14:18:05) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gmpy2, sys
>>> gmpy2.mp_limbsize()
64
>>> sys.int_info
sys.int_info(bits_per_digit=30, sizeof_digit=4, default_max_str_digits=4300, str_digits_check_threshold=640)

skirpichev · 2024-12-09T01:38:03Z

Ah, it seems mpz_export() is less efficient than mpn_get_str() for the given task. I used later to implement the mpz.to_bytes() method in my python-gmp package (draft pr: diofant/python-gmp#65). This seems to be faster than CPython's int's (v3.13.1) and gmpy2's mpz:

Code:

# a.py
from gmp import mpz
from gmpy2 import mpz as mpz2
import sys
import random
import time
import platform
import matplotlib.pyplot as plt

int_time = []
gmp_mpz_time = []
gmpy2_mpz_time = []

times = 15
nbits = int(float(sys.argv[1]))

random.seed(1)

r = sorted(random.sample(range(1, nbits), 500))

for k in r:
    ns = [random.randint(2**k//2, 2**k) for _ in range(times)]
    nl = [(n, (n.bit_length() + 7)//8 + 2) for n in ns]

    start = time.perf_counter_ns()
    for n, l in nl:
        n.to_bytes(l)
    int_time.append((time.perf_counter_ns() - start) / times)

    nl = [(mpz(n), l) for n, l in nl]
    start = time.perf_counter_ns()
    for n, l in nl:
        n.to_bytes(l)
    gmp_mpz_time.append((time.perf_counter_ns() - start) / times)

    nl2 = [(mpz2(n), l) for n, l in nl]
    start = time.perf_counter_ns()
    for n, l in nl2:
        n.to_bytes(l)
    gmpy2_mpz_time.append((time.perf_counter_ns() - start) / times)


fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(r, int_time, label='int.to_bytes()')
ax.plot(r, gmp_mpz_time, label='gmp.mpz.to_bytes()')
ax.plot(r, gmpy2_mpz_time, label='gmpy2.mpz.to_bytes()')
ax.set_yscale('log')
ax.set_xlabel('bits')
ax.set_ylabel('time (ns)')
ax.legend()
plt.title('Benchmark for to_bytes with ' + str(nbits) + ' bits.')
plt.show()
fig.savefig('to_bytes-'+str(nbits) + '.png')

If that does make sense for you, I'll eventually make pr against gmpy2.

It seems, that using mpn_get_str() is more efficient than generic mpz_export(). Some benchmarks are here: aleaxit#404 (comment) Not sure what else we can do for aleaxit#404. In the python-gmp I've added also the `__reduce__` dunded method. This seems slightly better than rely on copyreg to support pickling: | Benchmark | ref | patch | gmp | |----------------|:-------:|:---------------------:|:---------------------:| | dumps(1<<7) | 23.9 us | 23.8 us: 1.01x faster | 22.6 us: 1.06x faster | | dumps(1<<38) | 24.0 us | 23.9 us: 1.01x faster | 22.7 us: 1.06x faster | | dumps(1<<300) | 24.1 us | 23.8 us: 1.01x faster | 22.9 us: 1.05x faster | | dumps(1<<3000) | 26.8 us | 25.2 us: 1.07x faster | 23.8 us: 1.13x faster | | Geometric mean | (ref) | 1.02x faster | 1.07x faster | Can we add pickling to the gmpy2 with even less overhead? I don't know. But if we avoid pickle machinery, you can see noticeable performance boost for small numbers too: | Benchmark | to_binary-ref | to_binary-patch | |----------------|:-------------:|:---------------------:| | dumps(1<<7) | 323 ns | 300 ns: 1.08x faster | | dumps(1<<38) | 352 ns | 315 ns: 1.12x faster | | dumps(1<<300) | 603 ns | 436 ns: 1.39x faster | | dumps(1<<3000) | 3.17 us | 1.57 us: 2.02x faster | | Geometric mean | (ref) | 1.35x faster | New code seems faster than int.to_bytes() roughly from 500bit numbers on my system.

skirpichev mentioned this issue Dec 23, 2024

Optimize to_binary() function and to_bytes() method #539

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pickling is slow wrt int's #404

Pickling is slow wrt int's #404

skirpichev commented Mar 27, 2023

casevh commented Mar 28, 2023

skirpichev commented May 28, 2023 •

edited

Loading

skirpichev commented Dec 9, 2024

Pickling is slow wrt int's #404

Pickling is slow wrt int's #404

Comments

skirpichev commented Mar 27, 2023

casevh commented Mar 28, 2023

skirpichev commented May 28, 2023 • edited Loading

skirpichev commented Dec 9, 2024

skirpichev commented May 28, 2023 •

edited

Loading