forked from radii/msieve
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Readme.qs
157 lines (131 loc) · 7.8 KB
/
Readme.qs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
MSIEVE: A Library for Factoring Large Integers
Quadratic Sieve Module Documentation
Jason Papadopoulos
Introduction
------------
This file describes the implementation of the quadratic sieve that
Msieve uses. Whereas previous versions of Msieve could only use the
quadratic sieve, versions starting with v1.04 have the number field
sieve available, and that implementation is described in Readme.nfs
In factoring jargon, Msieve is an implementation of the self-initializing
multiple polynomial quadratic sieve (MPQS) with double large primes.
Unlike the NFS implementation, which is quite crude at the moment,
the QS implementation is very stable. To my knowledge it's the fastest
QS code available, sometimes by a huge margin. That's not to say that
it can't be better than it is now, but I've essentially run out of
ideas to improve it.
Since the QS implementation is quite good and the NFS implementation
is not, Msieve currently will always use the QS implementation by default.
It's early enough in the NFS development cycle that I have no precise
idea of the crossover point between QS and NFS in the specific case of
the Msieve library; experience with other NFS implementations suggests
that it should eventually be slightly under the 100-digit problem size.
Nevertheless, the QS side is still useful at all problem sizes.
Since the library works essentially automatically once given a number
to factor, the only caveats on using the QS implementation involve
cases where manual labor is required. To whit:
Distributed Computing
---------------------
By default, the demo application will keep going until all integers given to
it have been factored. If you happen to have several computers available and
need to factor a really big integer, it is possible to get all of the
computers to work on factorizing the same number. This process is not automated
and is a little labor-intensive, so don't be surprised if it turns out to be
somewhat inconvienient. That's the price you pay for 1) keeping the code and
the user interface simple but 2) finishing your factorization much faster.
Before giving you recipes for 'going distributed', you have to know in general
terms how the quadratic sieve works. This is not the place to explain that
in detail (the references do it much better than I can) but you'll be a lot
less confused if you grasp the following facts:
1. The quadratic sieve is a fast way of finding 'relations'.
A relation is a statement about the number you want to factor.
2. A single relation is theoretically capable of factoring your
number. In practice that's impossible; the best you can do
is find a lot of different relations that fail. For a really
big number, you'll need millions of these relations. The process
of finding relations is called 'sieving'.
3. Once you have 'enough' of these relations, you can take the
millions of relations and combine them together into
a few dozen super-relations that *are* capable of finishing the
factorization with high probability.
4. You don't know beforehand how many relations will be 'enough'.
If you already have a collection of relations, you can use some
math to determine whether there are 'enough' relations in that
collection.
5. The more sieving you do, the more relations you'll find. A block
of sieving work will produce an approximately constant number of
relations. So every computer you use for sieving will accumulate
relations at a constant rate. Obviously, the faster the computer
the higher the constant rate.
So (at least from low earth orbit) the quadratic sieve is divided into
two phases, the 'sieving' phase and the 'combining' phase. For msieve, the
combining phase takes a few minutes at most, but the sieving phase can take
days or weeks. It's the sieving phase we want to do in parallel. We have to
keep sieving until all of the computers involved have produced a total of
'enough' unique relations.
Now that the preliminaries are out of the way, here are the recipes for
going distributed:
RECIPE 1:
1. Start msieve on each sieving machine
2. Every so often (say, once a day):
- stop msieve on each sieving machine
- combine the savefiles from all the sieving machines into
a single 'super-savefile'. Leave the savefiles from the
sieving machines alone otherwise
- start one copy of msieve from the super-savefile. If it
finds there are 'enough' relations, then it will automatically
start the combining phase.
- If there are not 'enough' relations, the one copy of msieve
will start sieving again. Stop it, delete the super-savefile,
and repeat step 1.
RECIPE 2 (less data transfer):
1. Start msieve on each sieving machine
2. Every so often (say, once a day):
- stop msieve on each sieving machine
- APPEND the savefiles from all the sieving machines into
the existing 'super-savefile', then delete the savefiles
from the sieving machines
- start one copy of msieve from the super-savefile. If it
finds there are 'enough' relations, then it will automatically
start the combining phase.
- If there are not 'enough' relations, the one copy of msieve
will start sieving again. Stop it and repeat step 1. Do not
delete the super-savefile
Some notes on these recipes:
1. The copies of msieve that are sieving do not need to know about each other.
Each one uses random numbers to begin the sieving process, and it's
essentially impossible that two sieving machines will produce the same
output by accident. Even if they did, the combining phase will notice
and will remove the duplicates.
2. Each copy of msieve will give a running progress report of how much of the
factorization it has completed. These reports assume there is only one
sieving machine, so if you're using more than one sieving machine you
should not believe this output. When starting from the super-savefile,
there really *is* one machine running so the output is accurate. If starting
from the super-savefile shows that you only have a few relations left
to go, you don't have to restart all the sieving machines; just let the one
copy of Msieve finish the job.
3. At no point should the super-savefile contain any duplicate relations.
Obviously, finding one relation and copying it a million times will not
get you any closer to finishing the factorization. If you do this, you'll
fool msieve into thinking it has 'enough' relations, the combining phase
will start, all the duplicates will get deleted and the combining phase
will fail. It's just wasted effort. Likewise, sieving machines must all
generate their own relations; DO NOT give relations back to sieving machines
or share relations found across sieving machines. All you'll do is
generate a huge number of duplicates that will just get removed anyway.
4. Msieve uses the same name by default for its savefile. If your sieving
machines write to a network directory, make sure they do not all write to
the same file. There is no synchronization that keeps sieving machines
from stomping on each other's output, and the code needed to lock the
savefile is too machine-specific so I didn't add it. Msieve has a *lot*
of error checking in the combining phase, so it's possible that the
factorization will succeed even if everybody did write to the same savefile.
However, every corrupted relation that must be skipped in the combining
phase represents good work that you've wasted.
5. You will find for a really big factorization that relations start to
accumulate grindingly slowly. This is normal; as more work gets done, the
number of relations shoots up very quickly. By the time you're almost
finished, relations will be collecting at incredible speed.
With the recipes in mind and a little patience, distributed sieving can be
a powerful tool for finishing your factorizations faster.