-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up and include discrete logarithm functions into fmpz_mod #85
Conversation
This is significantly slower than sage right now... >>> from flint import fmpz_mod_ctx
>>> import timeit
>>>
>>> F = fmpz_mod_ctx(15126189511369606177)
>>> g = F(5)
>>> x = 8693773790212585448
>>> a = g**x
>>> x == g.discrete_log(a)
True
>>>
>>> timeit.timeit("g.discrete_log(a)", globals={"a":a,"g":g}, number=1000)
8.544251491999603 >>> from sage.all import GF, discrete_log
>>> import timeit
>>>
>>> F = GF(15126189511369606177)
>>> g = F(5)
>>> x = 8693773790212585448
>>> a = g**x
>>> x == discrete_log(a, g)
True
>>>
>>> timeit.timeit("discrete_log(a, g)", globals={"a":a,"g":g,"discrete_log":discrete_log}, number=1000)
3.307495746001223 I suppose a 2x speed up is expected as Flint makes you use a primitive root as a basis, so we have to do two times the number of dlog calls than you would with pari? |
I also think maybe I messed up with general stuff, as Flint seems quite a lot slower than sagemath Python 3.11.4 (main, Jul 25 2023, 17:07:07) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from flint import fmpz_mod_ctx as GF
>>> import timeit
>>>
>>> p = 2**255 - 19
>>> F = GF(p)
>>> a = 56453160318328368479028265579673628992621083854069162893669294104711133348344
>>> b = 22540897019420235855175446449184775485001034014122189096905866994063997055779
>>> a = F(a)
>>> b = F(b)
>>>
>>> timeit.timeit("a*a", globals={"a":a, "b":b}, number=1_000_000)
0.3558710659999633
>>> timeit.timeit("a*b", globals={"a":a, "b":b}, number=1_000_000)
0.3549386609993235 sage: from sage.all import *
....: import timeit
....:
....: p = 2**255 - 19
....: F = GF(p)
....: a = 56453160318328368479028265579673628992621083854069162893669294104711133348344
....: b = 22540897019420235855175446449184775485001034014122189096905866994063997055779
....: a = F(a)
....: b = F(b)
....:
....: timeit.timeit("a*a", globals={"a":a, "b":b}, number=1_000_000)
....: timeit.timeit("a*b", globals={"a":a, "b":b}, number=1_000_000)
....:
0.2524493979981344
0.2857475809978496 I think the first thing to change is my |
Rewriting Python 3.11.4 (main, Jul 25 2023, 17:07:07) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from flint import fmpz_mod_ctx as GF
>>> import timeit
>>>
>>> p = 2**255 - 19
>>> F = GF(p)
>>> a = 56453160318328368479028265579673628992621083854069162893669294104711133348344
>>> b = 22540897019420235855175446449184775485001034014122189096905866994063997055779
>>> a = F(a)
>>> b = F(b)
>>>
>>> timeit.timeit("a*a", globals={"a":a, "b":b}, number=1_000_000)
0.22983049499816843
>>> timeit.timeit("a*b", globals={"a":a, "b":b}, number=1_000_000)
0.2311925540016091 |
if typecheck(obj, fmpz_mod): | ||
if self != (<fmpz_mod>obj).ctx: | ||
raise ValueError("moduli must match") | ||
return obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly how the !=
is implemented at the C level here but if there was something to ensure that all contexts for a given modulus are unique in memory then this could be like a pointer comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh maybe this isn't doing the pythonic __eq__
here, I get easily confused when objects are in the C or python world
But I also think your previous suggestion to have a weak dictionary would be good...
src/flint/types/fmpz_mod.pyx
Outdated
|
||
TODO: This could instead be initalised as a class from a | ||
given base and the precomputations could be stored to allow | ||
faster computations for many discrete logs with the same base. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a .ctx
context object. Maybe this should be attached to the context object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be initialised at first use but stored on the context object after.
I don't know how much memory this uses compared to the size of the context object itself but if the base size of the struct is not large it could be stored inline as part of the ctx object or otherwise the ctx object could just have a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to have the first discrete log on some base g, which is an fmpz_mod object, but certain parts of L only depend on the modulus and so yes, this could be on the context for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see what you mean: there are two internal log calculations but if we stored the log of g
we could reduce that to one.
So what you could add would be an object like
log_g = fmpz_mod_log(g, ctx)
log_g(12)
I don't know how these things are used in practice and I don't have much sense for what the overhead would be of just always calling the internal log calculation twice (do these functions take a long time?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this was my "other" idea, and would optimse computing many discrete logs. I guess lots of people have different use cases. One idea could be to allow both a fmpz_mod
to have the discrete_log
as I present it here, but also to have a specialised class which allows people to speed things up if they want?
In sage
I would expect to be able to compute the discerte log with simply a function: discrete_log(a, b)
where I give the base explicitly. The fact Flint picks a base for the user is more like cado-nfs
, where it's more natural to precompute on a given primative root as the rest of the dlog is then fast for repeated uses. Pohlig-Hellman with bsgs or pollard's rho makes less sense for storing and reusing things like this IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at Flint, it seems to do Pohlig Hellman with bsgs (baby-step, giant step) for each factor. So the complixty (both time and space) is O(sqrt(ell)) where ell is the largest factor of (p-1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also doing way too much work, as it solves the discrete log for primative roots, but if I had some element of order 2, it would be much much faster to do Pohlig-Hellman in this subgroup.
TL;DR I think we shouldn't worry about optimising this in python-flint
as it's not something state of the art in Flint itself. E.g. Sage calls a sub-exponential algorithm via Pari for Z/NZ, which is (partially) where all the time saving is here.
Not sure what's causing macOS arm64 to crash, I can't reproduce it locally |
Do you have an arm64 CPU? |
No, sorry. What I mean is I don't think it's the python layer but maybe a bug around the discrete log formula running on Arm64, so I can't easily dig into it. |
You could add push some print statements up to see where exactly it crashes. Debugging via CI is not great but sometimes it is the easiest way. |
Oh great... it worked this time. That's even more annoying lmao. |
That's what "undefined behaviour" means: you can't even rely on it to crash properly! |
Addressing the above, let's see if it works again :) |
I've been experimenting with trying to implement the cache, but I haven't got something that works with cython. I came up with a naive idea: cache = {}
class C:
def __new__(cls, n):
instance = cache.get(n)
if instance is not None:
return instance
instance = super().__new__(cls)
cache[n] = instance
return instance
def __init__(self, n):
self.n = n
assert C(1) is C(1) and talking with some people on the CryptoHack Discord, class CachedMeta(type):
def __call__(cls, a):
if a not in cls._cache:
obj = cls._cache[a] = cls.__new__(cls, a)
obj.__init__(a)
return cls._cache[a]
class C(metaclass=CachedMeta):
_cache = {}
def __init__(self, a):
print("init", self, a) Which has a benefit of only calling One option I guess would be writing a python Sagemath has |
Potentially the right thing to do is to use something very similar to the following: https://github.com/sagemath/sage/blob/develop/src/sage/misc/fast_methods.pyx |
I would definitely not use metaclasses. The simple way to do this is just to have a function rather than messing around with different methods. The naive approach looks okay as well though. You can remove the |
The issue I had with the naive method was that cython compiler shouted at me for new and told me to use cinit, but it didn't feel like I could do the above inside the cinit |
Oh, right I missed that. This same issue arises for mpoly contexts. I don't remember how it is handled in that PR. |
Also a |
A classmethod was my first idea but I wanted to be able to make the class using fmpz_mod_ctx(n) I guess the easiest method is to do all of the cache with the function so the user never directly calls the class but this seems ugly. Will look at the mpoly PR later :) |
Another thing to think about is how nmod works differently (just two arguments to nmod) where as fmpz_mod asks for an integer and a context. The natural solution to me is to make a pseudo nmod_ctx which is callable in the same way fmpz_mod_ctx is handled so we can have nmod_ctx(x) behave as nmod(x, n) |
I see what you mean but I wonder if this is all just something to do at a higher level. It would probably quite easy to wrap up a lot Flint's functionality just in a relatively thin wrapper like the current interface where everything has the same name as the corresponding Flint types and functions. It would be nice to have a higher level but we can also define that on top like: class ZZ_n:
"""Integers mod n."""
def __init__(self, n):
self.n = n
if n < threshold:
self.dtype = lambda x: nmod(x, n)
else:
self.dtype = fmpz_mod_ctx(n)
def __repr__(self):
return f'ZZ_n({self.n})'
def __call__(self, val):
# This is the only performance sensitive part.
# Could optimise in Cython
return self.dtype(val) |
For the caching of contexts, the PR for fmpz_mpoly (#59) also has some TODO and my feeling is this PR could be merged and at a later date, potentially with input from deinst, we can find a standard method for context caching which we apply for all typoe contexts |
It might not be necessary to cache these contexts if they are not very complicated. The main thing is just how fast is the context comparison that is used in arithmetic e.g. python-flint/src/flint/types/fmpz_mod.pxd Lines 6 to 10 in 7394b1d
The question then is how exactly the != check here is implemented at the C level:python-flint/src/flint/types/fmpz_mod.pyx Lines 77 to 80 in 1da5e63
In the common case regardless of caching both fmpz_mod have the exact same context object and these are just equal pointers and we want to check that as fast as possible. It is not clear that Cython can just generate a pointer comparison though because it needs to call the fmpz_mod_ctx.__eq__ method:python-flint/src/flint/types/fmpz_mod.pyx Lines 97 to 100 in 1da5e63
Perhaps there could be a shortcut somewhere like if self.ctx is other.ctx or something.
|
The one thing that I would probably change here is to make the There could be a cdef method like Alternatively the struct could be inline in the |
So one option is we could allow both. We first check if self.context is other.context, and return true when this is the case. When this fails, we can compare the moduli themselves. This means if someone does decide to have multiple instances equality works as expected but for all the standard cases the same modulus would be reused and the fast is equality check would pass. Going further we could simply force people to reuse the object and implement something similar to "equality with ID" as below, where equality is at the memory level and therefore hash needs updating to reflect this. https://github.com/sagemath/sage/blob/develop/src/sage/misc/fast_methods.pyx |
We could even have as the equality check def __eq__(self, other):
# do type checking
return self is other or fmpz_equal(self.val.n, other.val.n) Which could still be hashed by the modulus? |
I agree about the point about the dlog context. I'll try and rework this to something which feels sensible. One option would be to have something like fmpz_mpoly, where these some self._init_dlog which is False generally and when a discrete log is needed it updates tbis to True and does the init and prime precomputation on the context? |
We could force reuse of the context object. Do you mean that this would fail:
Maybe I misunderstand something but that particular class seems pointless although perhaps harmless. It reimplements the methods that |
So (rough) benchmark for various moduli:
If we patch context equality to check equality in memory before anything else: def __eq__(self, other):
# Most often, we expect both `fmpz_mod` to be pointing to the
# same ctx, so this seems the fastest way to check
if self is other:
return True
# If they're not the same object in memory, they may have the
# same modulus, which is good enough
if typecheck(other, fmpz_mod_ctx):
return fmpz_equal(self.val.n, (<fmpz_mod_ctx>other).val.n)
return False
|
Moving precomputation from the |
src/flint/types/fmpz_mod.pxd
Outdated
cdef class fmpz_mod_ctx: | ||
cdef fmpz_mod_ctx_t val | ||
|
||
cdef fmpz_mod_discrete_log_pohlig_hellman_t L | ||
cdef bint _dlog_precomputed | ||
cdef any_as_fmpz_mod(self, obj) | ||
cdef _precompute_dlog_prime(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In memory terms the first element of L is an fmpz_mod_ctx_t
which would end up being a copy of val
. If we include the struct L
inline here then we don't need val
.
I think it is better though to just have a pointer like
cdef fmpz_mod_discrete_log_pohlig_hellman_t *L
If the pointer is set to NULL in __cinit__
then there is no need for a separate _dlog_computed
variable. The _precompute_dlog_prime
method can just check for NULL and then call malloc etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not addressed this yet. So you think I should set the pointer to NULL
for __cinit__
and then when I do the normal discrete log stuff then I want to work with the pointer self.ctx.L
?
I guess I can also do something similar for x_g
with a pointer?
EDIT: hmmm, I'm not sure how to dereference the pointer for the dlog call... I'm pretty stupid with the cython pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's like f(L[0])
. You can't use f(*L)
because that means something else in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh got it! The real issue was my malloc
which was poorly written before.
self.L = <fmpz_mod_discrete_log_pohlig_hellman_t *>libc.stdlib.malloc(cython.sizeof(fmpz_mod_discrete_log_pohlig_hellman_struct))
Seems to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(There was no flint_malloc
which I could see, so I did it this way, but we could maybe write a small function to do this?)
For the following prime in sagemath we find: sage: p = 2265864780164119884430588711799732781030118760200689436801123468132597600920497412606647
sage: p.nbits()
291
sage: factor(p-1)
2 * 1607 * 8807 * 12037 * 24533 * 25309 * 34871 * 36697 * 38459 * 38653 * 49531 * 53441 * 67867 * 77479 * 79283 * 81919 * 82493 * 86719 * 88873 * 97967
sage:
sage: F = GF(p)
sage: g = F(2)
sage: a = F.random_element()
sage: %timeit discrete_log(g, a)
8.82 ms ± 390 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) For the same parameters with flint, we find: from flint import fmpz_mod_ctx
import time
p = 2265864780164119884430588711799732781030118760200689436801123468132597600920497412606647
F = fmpz_mod_ctx(p)
g = F(2)
x = 1912688597311598499324749152080691796129556087988551556197726085819727650974177072903102
a = g**x
def time_dlog(g, a):
t0 = time.process_time_ns()
x = g.discrete_log(a)
time_ns = time.process_time_ns() - t0
assert g**x == a
return f"{time_ns / 1_000_000} ms"
time_dlog(g, a)
time_dlog(g, a)
So for the first call it's much slower, but for repeated calls we actually get something faster, despite using an algorithm with worse complexity. I have also tried adding |
You can't really compare |
Oh!! Very cool, TIL. So using In [1]: from flint import fmpz_mod_ctx
...:
...: p = 2265864780164119884430588711799732781030118760200689436801123468132597600920497412606647
...: F = fmpz_mod_ctx(p)
...: g = F(2)
...: x = 1912688597311598499324749152080691796129556087988551556197726085819727650974177072903102
...: a = g**x
In [2]: %timeit g.discrete_log(a)
3.11 ms ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) and with the new push which also stores In [2]: from flint import fmpz_mod_ctx
...:
...: p = 2265864780164119884430588711799732781030118760200689436801123468132597600920497412606647
...: F = fmpz_mod_ctx(p)
...: g = F(2)
...: x = 1912688597311598499324749152080691796129556087988551556197726085819727650974177072903102
...: a = g**x
In [3]:
In [3]: %timeit g.discrete_log(a)
1.41 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) As for "when is this useful", much of the time people only want to solve a single discrete log, so this caching is not so helpful, but there are other cases where a single base might need to be solved for several discrete logs and when (p+1) is not so smooth, the saving by storing |
Just to show the limitation of the discrete log being exponential in space/time: In [1]: from flint import fmpz_mod_ctx
...:
...: p = 2079343919
...: F = fmpz_mod_ctx(p)
...: g = F(7)
...: a = F(1507522516)
...: x = 191604335
...: assert a == g**x
...: assert x == g.discrete_log(a)
...: %timeit g.discrete_log(a)
395 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: q = 2079343919
sage: factor(q-1)
2 * 1039671959
sage: F = GF(q)
sage: g = F(7)
sage: a = F.random_element()
sage: a
1507522516
sage: pari.znlog(a, g)
191604335
sage: %timeit pari.znlog(a, g)
3.74 ms ± 75.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) |
I don't understand the latest error (or how I could have introduced it, at least)
|
Looks like a network error. I've restarted it. |
Happy with this now, but also happy to continue work if you have more ideas. I could probably re-read it all and try and speed things up a little more, but now the ctx comparison is faster there's not a whole bunch of obvious slow down imo |
Okay, looks good to me! |
Also some other clean up which should have been done before but I didn't see it.