-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pickling is slow wrt int's #404
Comments
I tried three tests.
The third test directly calls The Python code inside gmpy2 is ancient. I would accept removing it since we no longer need to support old versions of Python and I don't know if it is worth trying to improve on |
Ah, it seems Code:# a.py
from gmp import mpz
from gmpy2 import mpz as mpz2
import sys
import random
import time
import platform
import matplotlib.pyplot as plt
int_time = []
gmp_mpz_time = []
gmpy2_mpz_time = []
times = 15
nbits = int(float(sys.argv[1]))
random.seed(1)
r = sorted(random.sample(range(1, nbits), 500))
for k in r:
ns = [random.randint(2**k//2, 2**k) for _ in range(times)]
nl = [(n, (n.bit_length() + 7)//8 + 2) for n in ns]
start = time.perf_counter_ns()
for n, l in nl:
n.to_bytes(l)
int_time.append((time.perf_counter_ns() - start) / times)
nl = [(mpz(n), l) for n, l in nl]
start = time.perf_counter_ns()
for n, l in nl:
n.to_bytes(l)
gmp_mpz_time.append((time.perf_counter_ns() - start) / times)
nl2 = [(mpz2(n), l) for n, l in nl]
start = time.perf_counter_ns()
for n, l in nl2:
n.to_bytes(l)
gmpy2_mpz_time.append((time.perf_counter_ns() - start) / times)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(r, int_time, label='int.to_bytes()')
ax.plot(r, gmp_mpz_time, label='gmp.mpz.to_bytes()')
ax.plot(r, gmpy2_mpz_time, label='gmpy2.mpz.to_bytes()')
ax.set_yscale('log')
ax.set_xlabel('bits')
ax.set_ylabel('time (ns)')
ax.legend()
plt.title('Benchmark for to_bytes with ' + str(nbits) + ' bits.')
plt.show()
fig.savefig('to_bytes-'+str(nbits) + '.png') If that does make sense for you, I'll eventually make pr against gmpy2. |
It seems, that using mpn_get_str() is more efficient than generic mpz_export(). Some benchmarks are here: aleaxit#404 (comment) Not sure what else we can do for aleaxit#404. In the python-gmp I've added also the `__reduce__` dunded method. This seems slightly better than rely on copyreg to support pickling: | Benchmark | ref | patch | gmp | |----------------|:-------:|:---------------------:|:---------------------:| | dumps(1<<7) | 23.9 us | 23.8 us: 1.01x faster | 22.6 us: 1.06x faster | | dumps(1<<38) | 24.0 us | 23.9 us: 1.01x faster | 22.7 us: 1.06x faster | | dumps(1<<300) | 24.1 us | 23.8 us: 1.01x faster | 22.9 us: 1.05x faster | | dumps(1<<3000) | 26.8 us | 25.2 us: 1.07x faster | 23.8 us: 1.13x faster | | Geometric mean | (ref) | 1.02x faster | 1.07x faster | Can we add pickling to the gmpy2 with even less overhead? I don't know. But if we avoid pickle machinery, you can see noticeable performance boost for small numbers too: | Benchmark | to_binary-ref | to_binary-patch | |----------------|:-------------:|:---------------------:| | dumps(1<<7) | 323 ns | 300 ns: 1.08x faster | | dumps(1<<38) | 352 ns | 315 ns: 1.12x faster | | dumps(1<<300) | 603 ns | 436 ns: 1.39x faster | | dumps(1<<3000) | 3.17 us | 1.57 us: 2.02x faster | | Geometric mean | (ref) | 1.35x faster | New code seems faster than int.to_bytes() roughly from 500bit numbers on my system.
An example (~4-5x difference):
I wonder if using the pure python function for pickling support could be a significant part of this speed loss.
(Inspired by mpmath/mpmath#667)
The text was updated successfully, but these errors were encountered: