Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling is slow wrt int's #404

Open
skirpichev opened this issue Mar 27, 2023 · 3 comments
Open

Pickling is slow wrt int's #404

skirpichev opened this issue Mar 27, 2023 · 3 comments

Comments

@skirpichev
Copy link
Contributor

An example (~4-5x difference):

$ python -m timeit -r10 -s 'from math import factorial as f;from pickle import dumps as d' \
     -s 'a=f(1000)' 'd(a)'
50000 loops, best of 10: 5.7 usec per loop
$ python -m timeit -r10 -s 'from gmpy2 import fac as f, mpz as z;from pickle import dumps as d' \
    -s 'a=f(z(1000))' 'd(a)'
10000 loops, best of 10: 24.4 usec per loop

I wonder if using the pure python function for pickling support could be a significant part of this speed loss.
(Inspired by mpmath/mpmath#667)

@casevh
Copy link
Collaborator

casevh commented Mar 28, 2023

I tried three tests.

$ py311 -m timeit -r10 -s 'from gmpy2 import fac as f, mpz as z;from pickle import dumps as d' -s 'a=f(z(1000))' 'd(a)'
50000 loops, best of 10: 4.92 usec per loop
$ py311 -m timeit -r10 -s 'from math import factorial as f;from pickle import dumps as d' -s 'a=f(1000)' 'd(a)'
200000 loops, best of 10: 1.3 usec per loop
$ py311 -m timeit -r10 -s 'from gmpy2 import fac as f, mpz as z, to_binary as d' -s 'a=f(z(1000))' 'd(a)'
100000 loops, best of 10: 3.23 usec per loop

The third test directly calls gmpy2.to_binary which creates the pickled data. to_binary calls mpz_export for all the data manipulation.

The Python code inside gmpy2 is ancient. I would accept removing it since we no longer need to support old versions of Python and mpz is now a real type and not a factory function.

I don't know if it is worth trying to improve on mpz_export/mpz_import? The code in mpz_pylong.c may help.

@skirpichev
Copy link
Contributor Author

skirpichev commented May 28, 2023

BTW, the mpz.to_bytes() (or to_binary) is systematically slower than int.to_bytes(). Here is a simple benchmark:
to_bytes-100000

Code:
# a.py
from gmpy2 import mpz, to_binary
import sys
import random
import time
import platform
import matplotlib.pyplot as plt

int_time = []
mpz_time = []
#bin_time = []

times = 15
nbits = int(float(sys.argv[1]))

random.seed(1)

r = sorted(random.sample(range(1, nbits), 500))

for k in r:
    ns = [random.randint(2**k//2, 2**k) for _ in range(times)]
    nl = [(n, (n.bit_length() + 7)//8 + 2) for n in ns]

    start = time.perf_counter_ns()
    for n, l in nl:
        n.to_bytes(l)
    int_time.append((time.perf_counter_ns() - start) / times)

    nl = [(mpz(n), l) for n, l in nl]
    start = time.perf_counter_ns()
    for n, l in nl:
        n.to_bytes(l)
    mpz_time.append((time.perf_counter_ns() - start) / times)

#   start = time.perf_counter_ns()
#   for n, l in nl:
#       to_binary(n)
#   bin_time.append((time.perf_counter_ns() - start) / times)


fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(r, int_time, label='int.to_bytes()')
ax.plot(r, mpz_time, label='mpz.to_bytes()')
#ax.plot(r, bin_time, label='to_binary(mpz)')
ax.set_yscale('log')
ax.set_xlabel('bits')
ax.set_ylabel('time (ns)')
ax.legend()
plt.title('Benchmark for to_bytes with ' + str(nbits) + ' bits.')
plt.show()
fig.savefig('to_bytes-'+str(nbits) + '.png')

UPD:

$ python
Python 3.11.3+ (heads/3.11:9fbb614c4e, Apr 29 2023, 14:18:05) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gmpy2, sys
>>> gmpy2.mp_limbsize()
64
>>> sys.int_info
sys.int_info(bits_per_digit=30, sizeof_digit=4, default_max_str_digits=4300, str_digits_check_threshold=640)

@skirpichev
Copy link
Contributor Author

Ah, it seems mpz_export() is less efficient than mpn_get_str() for the given task. I used later to implement the mpz.to_bytes() method in my python-gmp package (draft pr: diofant/python-gmp#65). This seems to be faster than CPython's int's (v3.13.1) and gmpy2's mpz:

to_bytes-100000

Code:
# a.py
from gmp import mpz
from gmpy2 import mpz as mpz2
import sys
import random
import time
import platform
import matplotlib.pyplot as plt

int_time = []
gmp_mpz_time = []
gmpy2_mpz_time = []

times = 15
nbits = int(float(sys.argv[1]))

random.seed(1)

r = sorted(random.sample(range(1, nbits), 500))

for k in r:
    ns = [random.randint(2**k//2, 2**k) for _ in range(times)]
    nl = [(n, (n.bit_length() + 7)//8 + 2) for n in ns]

    start = time.perf_counter_ns()
    for n, l in nl:
        n.to_bytes(l)
    int_time.append((time.perf_counter_ns() - start) / times)

    nl = [(mpz(n), l) for n, l in nl]
    start = time.perf_counter_ns()
    for n, l in nl:
        n.to_bytes(l)
    gmp_mpz_time.append((time.perf_counter_ns() - start) / times)

    nl2 = [(mpz2(n), l) for n, l in nl]
    start = time.perf_counter_ns()
    for n, l in nl2:
        n.to_bytes(l)
    gmpy2_mpz_time.append((time.perf_counter_ns() - start) / times)


fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(r, int_time, label='int.to_bytes()')
ax.plot(r, gmp_mpz_time, label='gmp.mpz.to_bytes()')
ax.plot(r, gmpy2_mpz_time, label='gmpy2.mpz.to_bytes()')
ax.set_yscale('log')
ax.set_xlabel('bits')
ax.set_ylabel('time (ns)')
ax.legend()
plt.title('Benchmark for to_bytes with ' + str(nbits) + ' bits.')
plt.show()
fig.savefig('to_bytes-'+str(nbits) + '.png')

If that does make sense for you, I'll eventually make pr against gmpy2.

skirpichev added a commit to skirpichev/gmpy that referenced this issue Dec 23, 2024
It seems, that using mpn_get_str() is more efficient than generic
mpz_export().

Some benchmarks are here:
aleaxit#404 (comment)

Not sure what else we can do for aleaxit#404.  In the python-gmp I've added
also the `__reduce__` dunded method.  This seems slightly better than
rely on copyreg to support pickling:

| Benchmark      | ref     | patch                 | gmp                   |
|----------------|:-------:|:---------------------:|:---------------------:|
| dumps(1<<7)    | 23.9 us | 23.8 us: 1.01x faster | 22.6 us: 1.06x faster |
| dumps(1<<38)   | 24.0 us | 23.9 us: 1.01x faster | 22.7 us: 1.06x faster |
| dumps(1<<300)  | 24.1 us | 23.8 us: 1.01x faster | 22.9 us: 1.05x faster |
| dumps(1<<3000) | 26.8 us | 25.2 us: 1.07x faster | 23.8 us: 1.13x faster |
| Geometric mean | (ref)   | 1.02x faster          | 1.07x faster          |

Can we add pickling to the gmpy2 with even less overhead?  I don't know.

But if we avoid pickle machinery, you can see noticeable performance
boost for small numbers too:

| Benchmark      | to_binary-ref | to_binary-patch       |
|----------------|:-------------:|:---------------------:|
| dumps(1<<7)    | 323 ns        | 300 ns: 1.08x faster  |
| dumps(1<<38)   | 352 ns        | 315 ns: 1.12x faster  |
| dumps(1<<300)  | 603 ns        | 436 ns: 1.39x faster  |
| dumps(1<<3000) | 3.17 us       | 1.57 us: 2.02x faster |
| Geometric mean | (ref)         | 1.35x faster          |

New code seems faster than int.to_bytes() roughly from 500bit numbers on
my system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants