-
Notifications
You must be signed in to change notification settings - Fork 50
/
Changes
1792 lines (1722 loc) · 87.8 KB
/
Changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Version 1.46:
- NFS poly selection changes:
- Added a high-performance root sieve for degree 6 problems
- Allowed stage 1 and stage 2 to be run separately,
with stage 1 output saved to file and read back in stage 2
- For large problems, added a global optimizer that runs
before the conventional size optimization step; this is
currently not used because it destroys the size property
of polynomials that leave stage 1 with a good score
- Performed an overhaul of the size optimization (thanks
Paul Zimmermann)
- When rating poynomials, remember the number of real roots
(thanks Paul Zimmermann)
- Reduced the number of clique removal passes when there is a large
amount of excess (thanks Greg Childers)
- Optimized the power detection code in the main driver (thanks axn)
Version 1.45: 4/21/10
- NFS poly selection changes:
- Merged the CPU and GPU branches more tightly, and centralized
much of the GPU handling code
- Added PTX inline assembly language for a small speedup
- Added specialized routines that make poly selection
for inputs < 135 digits about 35% faster
- Fixed some degree 5 synchronization issues (thanks
Jayson King)
- Tightened up the construction of arithmetic progressions
in stage 1 (thanks Jayson King)
- Added code to increase the size of host arrays when using
more powerful GPUs (thanks Paul Zimmermann)
- Added code to automatically randomize the search for
inputs that are large enough
- Made the cutoff E-value more aggressive for the largest
jobs (thanks Tom Womack / Paul Leyland / Greg Childers)
- Modified the linear algebra to write new checkpoint files first,
then overwrite the old checkpoint only if the write completed
(thanks Greg Childers)
- Cleaned up the makefile a bit
- Cleaned up the wording of the makefile usage
- Added code to delete the largest temporary file generated during
filtering (thanks Greg Childers)
Version 1.44: 2/5/10
- NFS poly selection changes:
- Added a branch that uses Nvidia GPU code. This makes stage 1
run enormously faster (my medium-end GPU is 35x faster than
one core of my medium-end CPU). The GPU branch also contains
root generation code that is critically needed to search for
degree-6 polynomials. Hopefully in future releases the
GPU and CPU code will be more tightly integrated (thanks
to many folks at Mersenneforum for helping to test this)
- Force the bounds checking for degree 4 to be more strict
than degree 5 or 6 (thanks Jayson King)
- Modified the main driver to avoid saving nonexistent
polynomials
- Modified the main driver to use stricter bounds for
accepting polynomials
- Fixed a bug in the relation reading code that caused crashes when
initializing empty dependencies (thanks Jeff Gilchrist)
- Fixed a bug reading factor base files without a trailing newline
(thanks Jayson King)
- Allowed NFS poly selection and sieving for inputs down to 80 digits
(thanks andi47)
Version 1.43: 10/18/09
- Made the GMP library mandatory
- Removed the arbitrary precision math library, replaced with GMP
- Modified the NFS relation handling code to allow 63-bit 'a'
values and 48-bit prime factors. The code now also performs
runlength-encoding on the list of prime factors
- Added 64-bit modmul operations to all builds
- Added checks that the input NFS number corresponds to the input
NFS polynomials (thanks Tom Womack)
- NFS square root changes:
- Optimized the brute force square root code, especially
when dealing with degree > 6 (thanks Serge Batalov)
- Modified the NFS relation reading code to ignore
relations that would not participate in dependencies anyway.
This also makes the NFS square root a bit simpler
- Modified the initialization to be a little more
robust (thanks andi47)
- Add support for degree 8 NFS polynomials (thanks Serge Batalov)
- Deleted the ancient unskewed NFS polynomial selector
- Removed now-unused double-double code
- Reset the number of relations and ideals properly when reading
LP relations with increasing max weight (thanks Greg Childers)
- Increased the input size limit to ~300 digits
- Adjusted the library link order in the makefile
- Added float.h for unix builds (thanks Christian Cornelssen)
Version 1.42: 7/19/09
- NFS polynomial selection changes:
- Added support for generating degree-4 and degree-6
GNFS polynomials, as well as some tuning for the degree-4
case. Degree 6 is not completely done (needs a better
root sieve and parameter selection)
- Added a tweak to the end of stage 1 that makes sure
most polynomials passed to stage 2 have third-to-last
coefficients that are small enough. This allows stage
2 to run on more candidates, especially when the leading
coefficient is large
- Added several internal checks
- Tweaked the stage 1 sieving parameters to make the inner
loops longer during large jobs
- Switched to exponential (not linear) interpolation
when choosing parameters (thanks Tom Womack)
- Added degree-5 parameters for very large jobs, up to
180 digits (thanks Tom Womack)
- Reduced the minimum NFS input size to just under 90 digits
- Reverted back to to using the L2-norm for the initial
multivariable optimization step in stage 2; using Bernstein's
scoring function is not sensitive enough
- Pushed the file for saved NFS polynomials into the skewed
poly selector, instead of making it globally visible in
general poly config structures. This also allows the file
to only be created if polynomial selection happens
- NFS filtering changes:
- Removed most of the NFS-specific singleton removal; instead,
write the large ideals to file and read that in sub-
sequent passes
- Made the disk-based singleton removal into common code
- Retool the filtering driver to make the minimum number of
disk passes, based on the available memory and the number
of relations to deal with
- Made the duplicate removal guess the best size of the
hashtable to use (saves lots of memory for big jobs)
- Performed a complete overhaul of the polynomial rootfinder,
replacing Laguerre's method with a simplified Jenkins-
Traub rootfinder combined with Newton polishing. This
will hopefully always work, even for the surprising
number of SNFS polynomials whose equimodular roots
completely foiled the original rootfinder (thanks to
Brian Gladman for a great deal of help)
- Changed the ECM driver to use the supported public interface
to GMP-ECM; also compiled the demo binary with GMP-4.3.1
and GMP-ECM 6.2.3
- Added an integrity check to the linear algebra, to detect when
the state gets corrupted (thanks Kazumaro Aoki)
- Modified the NFS relation parsing to allow relations with
factor lists that are empty, since the smallest factors
are not printed (thanks Serge Batalov)
- Improved the error handling of the multivariable minimization
and special-cased 1-dimensional minimizations
- Added many patches to the inline asm (thanks Alex Kruppa)
- Made most of the matrix file IO routines into common code
- Got started cleaning up the huge number of type conversion warnings
that gcc 4.4.0 emits
- Fixed a bug in the linear algebra that caused an immediate
write of a checkpoint on startup (thanks Serge Batalov)
- Fixed a potential register allocation bug in the linear
algebra (thanks Emmanuel Thomé)
- Made sure the search for QCB primes does not get too close
to 2^32 (thanks Tom Womack)
- Fixed a small memory leak in the integrator (thanks valgrind)
Version 1.41: 4/3/09
- Added an extra phase after the clique removal in the NFS filtering,
that deletes heavy relations until the specified excess
is achieved. I only expect this to be useful in the case of
extreme oversieving (thanks to Bruce Dodson for showing
how necessary this was, even for very large jobs)
- Added tweaks to the GMP conversion functions to account for
64-bit MSVC (thanks Brian Gladman)
- Added assembly language for 64-bit MSVC, for use with NFS
polynomial selection (thanks Brian Gladman / Jeff Gilchrist)
- Fixed a crash in NFS polynomial selection, that happens when
two products of small primes have very different size
(thanks Mikael Klasson)
- Made the polynomial rootfinder choose initial values away from
the origin (thanks to Al Edwards for a very pathological
SNFS polynomial)
- Set the default skewness to 1.0 in more places (thanks Tom Womack)
- Lowered the minimum size that's allowed to run GNFS
- Recompiled the demo binary to use GMP-ECM v6.2.2
Version 1.40: 3/24/09
- NFS polynomial selection changes:
- Added Murphy's scoring algorithm, expressed as a
numerical integration. The Murphy score is used as the
final measure of polynomial goodness, and is directly
comparable to the scores produced by the GGNFS tools
- Made the numerical integration code adaptive, and greatly
simplified it
- Added major changes to the stage 2 root sieve, which reduce
overhead and allow quick searching of extremely large search
spaces. This is required so that large inputs do not cause
the root sieve to literally take forever
- Made the polynomial rootfinder work in double-double
precision. This is neeeded to compute roots to full
double precision accuracy, preventing numerical instability
in Bernstein's algorithm
- Reduced some of the overhead in stage1 and added 64-bit
assembly language (much more to do here)
- Changed the initial stage 2 numerical optimization to only
select rotations, and to use Bernstein's scoring function
- Fixed a bug in the multivariable optimization and made
the solver into common code
- Added error bailout code to the poly rootfinder
- Changed the format of intermediate saved polynomials to
be compatible with GGNFS; this means an entry from
the ".p" file can be cut-n-pasted into the GGNFS tools
- Made lots of NFS utility functions, and most of the NFS filtering,
into common code, in preparation for an overhaul of the QS code
- Generalized the hashtable code to automatically grow the hash
array and to index arbitrary size structures. This is a
necessary first step for allowing NFS postprocessing to
scale beyond what it can handle now
- Modified the main driver to allow NFS on any size input, no matter
how small, if only the postprocessing is desired
- Added patches from Brian Gladman that allow the Lanczos
inline asm to work with MSVC Express (thanks Ben Buhrow)
- Added more Intel cache codes and better CPU identification
- Made NFS input ranges 64-bit numbers to deal with large leading
coefficients for NFS polynomial selection
- Switched to measuring CPU time when calculating deadlines or
elapsed time (thanks andi47)
- Added printing of elapsed time in each stage of NFS postprocessing,
for compatibility with GGNFS scripts (thanks Jo Yeung Uk)
- Inlined the modular inverse routines and added 64-bit versions of
several functions
- Fixed a typo when conditionally defining HAS_CMOV, and also
when turning on MMX and SSE for the QS code
- Cleaned up the MSVC project files (thanks Jeff Gilchrist)
- Tweaked some asm to compile correctly with gcc 4.x; also changed
the generic code branch of mp_mod{add|sub}_1
- Allowed the NFS filtering to limit the number of relations read in
Version 1.39: 12/1/08
- NFS polynomial selection changes:
- Rewrote stage 1 to use Kleinjung's new algorithm as
described at the CADO workshop. This is much simpler and
potentially can find better polynomials
- Major overhaul of the root sieve; this was badly needed
because Kleinjung's new algorithm generates enormous
search spaces for roots. The result is a lot more modular,
allows 64-bit rotations, uses lattice sieving to cover
large ranges, and searches arbitrarily-shaped intervals
without regard to their size
- Added error aborts to the stage 2 size minimization step
- Added the ability to specify coefficient ranges to search
via command line option in the demo binary, and also
enforced a time limit on polynomial search that is specified
in the driver
- Merged extensive patches from Brian Gladman that fix the use of
inline assembly language for MSVC and Intel's compiler, on
both Windows and Linux
- Corrected many long-standing errors in the x86 assembly language
- Modified the NFS filtering to not preemptively rerun clique
processing unless the merge phase proves that it has the
capacity to see more ideals (thanks Bruce Dodson)
- Allowed NFS polynomials up to degree 7 (thanks Serge Batalov)
- Reduced the number of quadratic characters to 20; this makes
the NFS linear algebra slightly more efficient
Version 1.38: 9/24/08
- Changed the NFS square root to only consider relations that appear
an odd number of times, and then only once. This reduces
the runtime and memory use of the entire square root by up
to a factor of two, though perhaps this depends on the
amount of oversieving (thanks Paul Zimmermann)
- Upgraded the library to expect GMP-ECM v6.2.1, and merged many
fixes by Christian Cornelssen to the GMP-ECM driver
- Fixed a bug enumerating polynomial coefficients in stage 1 of
the skewed NFS polynomial selection (thanks Patrick Stach)
- Changed the NFS filtering to test the density of the dataset
after clique removal and retry the later stages if the
density is too low (thanks Tom Womack)
- Changed the linear algebra to report the estimated time to
completion (thanks Serge Batalov)
- NFS sieving changes:
- Modified the factor base generation to correctly report
progress when the algebraic and rational bounds are different
- Increased the maximum size of primes used by the
batch factoring
- Changed the sieving to not quit in the middle of
a sieve line when interrupted
- Incremented the maximum size of numbers multiplied in the NFS
square root (thanks Paul Zimmermann / Alex Kruppa)
- Removed the extra sanity check added to the NFS square root in
the previous version; Paul Zimmermann rightly points out
that it treats a natural outcome of the square root as an error
- Fixed a bug initializing the savefile memory buffer
Version 1.37: 8/27/08
- Performed a significant overhaul of stage 1 in the NFS polynomial
selection (needs much more work)
- NFS filtering changes:
- Combined the duplicate removal with analysis of free
relations; now successive filtering runs will add more
free relations, and large primes have free relations chosen
for them independent of any factor base bound (thanks
to Paul Zimmermann for showing that extra free relations
make a noticeable difference in filtering quality)
- Overhauled the main filtering driver and the singleton
removal; now the singleton and clique passes are completely
separate and redundant operations are removed. The most
noticeable effect is that there is no clique processing
after the disk-based singleton removal
- Changed the merging to avoid favoring merges that cancel
out relations; this makes merging faster, and the merge
postprocessing phase deletes redundant relations anyway
- Fixed a small bug in the merge phase, and added a
little extra cleanup
- Added a limit on the size of cliques (thanks Hugo Platzer)
- Increased the effort expended to calculate the root score
for NFS polynomials, for p dividing the polynomial
discriminant (thanks Paul Zimmermann)
- Rearranged the main driver to print any factors found if it
is interrupted
- Fixed an inconsistency in the way the windows version of the
savefile reading code emulated end-of-file
- Allow the demo binary to optionally run at idle priority
(thanks Mark Rodenkirch)
- Made GNFS parameters more sensible (thanks Jo Yeong Uk)
- Fixed the win32 version of rint() (thanks Brian Gladman)
- Added a little extra paranoia to the NFS square root
- Increased the maximum input size to 275 digits; anyone
who pushes *this* limit is nuts
Version 1.36: 5/17/08
- NFS polynomial selection changes:
- Complete overhaul of the root sieve code. The result
will potentially work a lot harder, and does not depend
on the approximate measures of polynomial size that were
originally used
- Added many changes and bugfixes to the stage 2 multivariate
optimization
- Changed the objective function in the initial multivariate
optimization pass of stage 2 to match the rounding of
variables at the end. This prunes many more polynomials
- Added buffering of the most promising polynomial rotations
so that Bernstein's rating algorithm is applied much more
often than before (it's much faster than Murphy's algorithm
so we can afford to use it more)
- Linear algebra changes:
- Removed one instruction from one of the Lanczos
vector-vector inner loops
- Moved the allocation of per-thread memory into the
thread loop proper, to hopefully allocate node-local
memory on NUMA systems
- Added extra paranoia when calculating the number of threads
to spawn; also added code to compensate if the actual number
of threads is different
- Add extra safeguards when constructing the post-Lanczos
matrix in the beginning of the linear algebra
- Reduced the minimum matrix size for which blocking and
multithreading is turned on
- Added forgotten 64-bit windows define
- Improved the cache size detection to account for the latest Intel
and AMD processors (thanks Greg Childers)
- Changed the NFS filtering to increase the target matrix density,
since the improved filtering code generates much better
matrices in the presence of significant excess relations
- Added a forgotten patch to the MP library, in the non-assembly-
language branch (thanks Christian Cornelssen)
Version 1.35: 4/14/08
- NFS skewed polynomial selection changes:
- Replaced the rather simple stage 2 polynomial optimization
code with more advanced global and local methods from
Numerical Recipes (needs lots of cleanup)
- Improved the calculation of projective roots after the
initial stage 2 optimization pass
- Substituted Bernstein's algorithm in place of Murphy's
for the final rating of skewed polynomials
- Modified stage 1 to forward all polynomials to the
optimization code in stage 2; this centralizes the
optimization step for a small performance penalty
- Removed large amounts of now-redundant code
- Modified the size score to consider all complex roots
of the polynomial; this is necessary even if the imaginary
parts of the root have large absolute value
- Fixed a buffer overflow bug in the NFS square root, which was
actually a long division bug that only showed up in MSVC
(thanks Chad Davis and Sergey Miklin for debugging help)
- Modified the NFS filtering to reduce the filtering bound
if more than two passes are needed (thanks Chad Davis)
- Updated the MSVC build project to v9 and fixed several
porting errors in the previous release (thanks Brian Gladman,
Chad Davis, Sergey Miklin)
Version 1.34: 3/22/08
- Added a heavily modified version of the Franke/Kleinjung NFS
polynomial selection tools. Massive additional work is
still needed here
- NFS filtering changes:
- Reduce the number of singleton removal passes to two
(on average). The second pass now uses an extremely small
filtering bound, writes to disk, and reads back only the
ideals that occur rarely. This is a lot faster and more
robust, especially in the case of heavy oversieving
- Fine-tuned the choice of initial bound. The new algorithm
is less needlessly conservative
- Changed the hashing of (a,b) pairs in the duplicate removal
to hopefully avoid most false positives
- Changed the QS code to use signed_mp_t functions where appropriate.
This removes a number of silly hacks
- Forced the NFS square root to always use IEEE precision, after
taking a week to find out it was set incorrectly for a large
SNFS run (thanks Richard Wackerbarth)
- Added tuning of the size of reciprocals in the NFS square root,
to avoid wasting time on small inputs (thanks Chad Davis)
- Added initialization for NFS parameters when the input size exceeds
the pre-tabulated list (thanks Greg Childers)
- Modified the mp_mod{add|sub}_1 routines to be more efficient and
avoid potential register allocation errors (thanks Alex Kruppa)
- Linked the win32 demo binary to be large-address aware, allowing
3GB of memory to be used on windows machines (thanks _dc_)
Version 1.33: 1/13/08
- Centralized all the handling of savefiles, in order to abstract
away a bug fix that lets Windows binaries read and write
savefiles over 4GB in size
- Overhauled the computation of root score in the NFS polynomial
selection. The result is more accurate, deterministic,
and much faster
- NFS square root changes
- Modified the FFT library to change the number of bits
in convolution elements dynamically, which halves the
size of the convolution sometimes
- Adjusted the size of reciprocals computed during the
algebraic square root to be as large as possible
for a given size FFT
- Implemented a cap on the number of quadratic characters; this
allows caching of the computed values within the relation
structures, and speeds up building of the initial NFS matrix
(the code is cleaner too)
- NFS filtering changes:
- Changed the duplicate and singleton removal to save the
line numbers of relations to *skip*, not keep, when
performing the initial disk-based passes. This saves having
to write out huge disk files when most relations start
off being valid and not duplicates
- Changed the final disk-based singleton pass to dump
relations to disk and read into memory at the end of the
pass. This greatly reduces the maximum memory use of the
singleton removal
- Generalized the clique removal to be able to handle
really huge amounts of excess relations (thanks Chad Davis)
- Changed the clique removal to avoid bothering to do any
work if the number of cliques to delete is too small
- Updated the comments to reflect the fact that what this
deletes is not really cliques but just connected components
of a graph (thanks Alex Kruppa / Chris Card)
- Modified the line sieve parameter lists to allow asymmetric sieve
lines (thanks Tom Womack)
- Added reporting of memory use to the NFS merge phase and linear
algebra (the former is not very accurate, for some reason)
- Modified the NFS free relation building code to avoid having
to generate a factor base
- Restored the code that allowed separate rational and algebraic
filtering bounds
- Modified the NFS batch factoring to be much more careful checking
the size of factors found (thanks Tom Womack)
- Added logging of the amount of ECM work completed, and improved
the stopping conditions to account for multiple factors found
(thanks henryzz)
- Restored error recovery code that was mistakenly removed from
the linear algebra (thanks Cedric Vonck)
- Fixed a bug in the early-exit part of the new trial factoring code
(thanks Dennis Langdeau)
- Added more special cases for Intel's compiler
- Fine-tuned the checks for corrupt NFS relations (thanks Tom Womack)
- Modified the NFS initialization to not delete the savefile if
no sieving is specified (thanks Philippe Strohl)
- Switched to a consistent buffer size to receive lines from
the savefile; also increase the size to accomodate larger N
- Removed the silly approximations used when reporting the size of
inputs to the QS and NFS routines
- Increased the maximum input size to 255 digits
Version 1.32: 12/15/07
- Merged a patch from 'forzles' that changes the threading
implementation in the linear algebra to work using a
thread pool instead of spawn-and-join. This allows a
significant speedup on multi-core CPUs, sometimes 2 to
4 times the speedup from using extra threads compared to
v1.31
- Split out the main computational linear algebra routines from the
multithread framework that uses them; this cuts out 2/3
of the code that people optimizing the linear algebra need
to see in one file
- Changed the type assigned to uint32, to silence cygwin warnings
- Changed the relation-reading code to reject relations whose
(a,b) pair is zero (thanks Greg Childers)
Version 1.31: 12/13/07
- NFS sieving changes:
- Added experimental code to batch relations containing large
primes together and factor them all at once, using an
algorithm described by D.J. Bernstein
- Changed the sieving to dynamically choose whether relations
will have two or three large primes, with the choice re-
calculated for every sieve block. This item and the
previous one makes it possible to find relations with three
rational and/or algebraic large primes at modest extra
runtime cost, and increases the overall speed of line
sieving by 50-100% when the input is large enough. The
same techniques can apply to a future lattice siever
- Made the choice of cutoff for trial factoring relations
automatic
- Fixed a minor bug in the resieving code
- Modified the sieving to avoid printing factors < 256
- Added the ability to optionally compile and optionally run GMP-ECM
on all inputs, with a little tuning to attempt to minimize
the overall time
- Overhauled the Pollard Rho code; the new version is much more
efficient and configured to do more work in order to find
factors of expected size
- Overhauled the main driver to remove some hacks and some
unnecessary work
- Added a compile-time list of primes that several subsystems within
the library now use
- Removed trial factoring from the MPQS and NFS drivers
- Added early-out optimizations to the trial factoring
- Modified the tiny factoring code to revert to QS if SQUFOF
fails (thanks Dennis Langdeau), and always perform
Pollard Rho
- Added wrappers for all the C memory allocation functions that abort
if memory allocation failure is detected
- Added compile options that allow accessing large files on many
32-bit unix systems (thanks Greg Childers)
Version 1.30: 11/16/07
- NFS filtering changes:
- Modified the merge phase to use pairs of 32-bit words
to represent nonzero matrix elements, instead of a single
linked list structure with pointers. This simplifies
a lot of code, makes merging 30-50% faster for big jobs, and
reduces the memory use on 64-bit machines by over 60%
- Modified the merge phase to concatenate the lists
of relations and ideals within a relation set. This removes
half of the memory management during merges
- Removed legacy #defines from the duplicate removal
- Added checks for the Intel compiler, which can now understand
gcc inline assembly (thanks to Brian Gladman)
- Restored MSVC versions of the Lanczos inline assembly (thanks to
Brian Gladman)
- Added an extra check for zero polynomials in the factor base
root finder (thanks Hallstein Hansen)
- Fixed a typo in the main library driver (thanks Philip Mason)
- Modified the tiny factoring code to revert to Pollard Rho if
the SQUFOF or tiny QS routines fail (thanks to Philip Mason
for a test case)
- Made the numerical rootfinder work harder to find difficult roots
- Turned the breakover point for using the tiny factoring routines
into a library-wide #define
- Restored logging of the initial NFS matrix size; this was mistakenly
removed in the previous version
Version 1.29: 10/28/07
- Linear algebra changes:
- Added checkpoint and restart for large matrix sizes
- Added Montgomery's optimizations to reduce the number
of vector-vector operations
- Modified the matrix multiply assembly code for slightly
higher performance on 64-bit x86
- Pushed the dense matrix rows into the per-thread state,
so that multithreaded runs execute the dense row portion
of the matrix multiply in parallel
- Modified the test for iteration failure *again*, after
Greg Childers encountered it on a very large job
- NFS linear algebra changes:
- Performed an overhaul of the matrix-building code.
This makes the process much more modular and eliminates
unnecessary sorting. It also noticeably decreases memory use
- Modified the matrix building code to avoid needing a
complete factor base
- NFS filtering changes:
- Increased the array sizes used for the 2-way merges,
and added extra sanity checking (thanks Tom Womack)
- Modified 2-way merge phase to delete relation sets
that would never survive the full merge phase
- Allowed the filtering bound and number of relations read
to be passed in from applications. The latter is convenient
for exploring the filtering space and the former can
be used to correct suboptimal bounds that the library chooses
- Modified find_large_ideals() to use a single bound for large
ideals, not separate rational and algebraic bounds
- Changed the file format for NFS cycles. The new format takes up
slightly more disk space, but simplifies nfs_read_cycles and
removes some hacks that were necessary with the old format
- Modified nfs_read_cycles to optionally return just the cycles
and not the relations they need
- Switched the 'size' and 'number' arguments of most calls to
fread and fwrite
- Increased the maximum size of the FFT multiply
- Removed the warning printed for large inputs; most factorization
attempts for numbers > 150 digits are SNFS jobs, not RSA keys
(thanks Sander Hoogendorn)
Version 1.28: 10/3/07
- Fixed logging of factors to avoid trying to free a stack array
- Fixed 64-bit compile warnings
Version 1.27: 10/2/07
- Much more work on NFS polynomial selection (still not runnable)
- Linear algebra changes:
- Added special block-multiply code for the moderately
dense matrix rows. This reduces the memory use of the
linear algebra by about 15%
- Used 32-bit loads to grab pairs of array offsets at a
time within the matrix multiply. Since the number of
memory accesses is a bottleneck, the reduced number of
these provides a 1-2% speedup
- Always track the number of dimensions solved. This prevents
spurious failures when the solver runs on small matrices
- Aligned the vector-vector loops
- Modified the polynomial rootfinder to use different starting points
when looking for roots. This is required for many SNFS
polynomials whose derivatives vanish at the default start point
- Modified the driver to not log any factors if there is only
one composite factor found (i.e. the library was interrupted)
- Updated the default makefile target to state that gcc
on x86 must use the specialized make targets
- Raised the input size limit to 235 digits
Version 1.26: 8/2/07
- Began experiments on improved polynomial selection (nothing
is in a runnable state yet)
- Changed the scratch structure used in the NFS filtering
merge phase to use hardwired array sizes. This removes
a lot of needlessly complex code and prevents a
subtle buffer overflow (thanks Hallstein Hansen)
- Added signed_mp_{add|sub|mul|clear|copy}
Version 1.25: 6/27/07
- Changed the QS sieve core to empty MMX state before attempting
any trial factoring of sieve values. This had been broken
since v1.20, but the last version's changes to mp_iroot
were the first to make the bug fatal (thanks Miroslaw Kwasniak)
- Fixed comments in the numerical integrator
Version 1.24: 6/25/07
- Removed the dependency on the Gnu Scientific Library. The NFS
polynomial selector now is completely self-contained, and
hopefully will be able to handle any input polynomial
(GSL's numerical integrator sometimes fails). Many thanks
to Brian Gladman for lots of help doing this
- Added numerical integration code provided by Brian Gladman
- Added a polynomial rootfinder derived from Numerical Recipes
- Modified the NFS filtering driver to lower the filtering bound
and retry the merge phase if the matrix produced is too
sparse. This seems to be required in order for the linear
algebra to produce nontrivial dependencies sometimes
(thanks to Tom Womack for a very obstinate factorization)
- Changed the NFS filtering to hardwire the maximum number of
relations in a relation set
- Performed a major overhaul of mp_iroot
- Modified an error test in the linear algebra, after a report
from Hallstein Hansen that the error still occurs,
probably spuriously
- Fixed a possible buffer overflow in mp_mul (thanks valgrind)
- Fixed a serious memory leak in the cleanup after the merge
portion of NFS filtering (thanks valgrind)
- Fixed a memory leak in the trial division used by the NFS
driver. Also made trial division mandatory in the NFS
driver, since the factor base bound will always exceed the
trial factoring bound (thanks valgrind)
- Corrected a typo in the NFS siever (thanks Joppe Bos)
Version 1.23: 6/3/07
- Linear algebra changes:
- Made singleton removal alternate with deleting cliques
from the input matrix. This seems to be required when
the initial matrix is large and very sparse, otherwise
only trivial dependencies are found
- Made the iteration skip verifying that all columns are
used in the last iteration. I think that this causes
unnecessary restarts of the linear algebra
- Allowed for combine_cols to find zero dependencies
- NFS filtering changes:
- Allowed the singleton removal to grow the size of
hashtables used during disk-based passes
- Modified the filtering driver to more intelligently tune
the bound on large ideals when there is a very large
amount of excess, and/or when sieving uses 29+ bit
large primes
- Changed the singleton removal cutoffs to avoid doing the
last (pretty useless) disk-based pass most of the time
- Changed the singleton removal to delete the file from the
duplicate removal phase when finished with it
- Fixed a bug reading NFS relations (thanks Greg Childers)
Version 1.22: 5/27/07
- Linear algebra changes:
- Added simple multithreading
- Modified the dense matrix multiply to use blocks of 64
rows instead of 32. This is slightly faster, and allows
reusing the general vector-vector code from elsewhere
- Modified the matrix packing code to pack the matrix in
two passes instead of one; this saves an enormous number
of reallocations and significantly reduces the working
set size of the linear algebra
- Modified the driver to correctly choose when to pack
the matrix, and to correctly pack dense rows when no
post-Lanczos matrix is desired
- Fixed a memory leak freeing the packed matrix
- Cleaned up combine_cols
- Added more fixes for OS X Intel (thanks Romain Muguet)
- Forced the buffer size for ideals in a relation_lp_t to match
that of the buffer in a relation_t. This fixes a fairly
common NFS square root failure (thanks Hallstein Hansen)
- Fixed a bug in the factor base generator change from v1.21
(thanks Tom Womack and Jes Hansen)
- Made the Pentium 2 and 3 QS sieve cores have a 64kB block size
- Modified the expression evaluator to accept integers in octal or
hex format. Dear reverse engineers: it isn't that hard
- Added more sanity checking to NFS matrix construction
- Added more QS tuning from Bill Hart
- Fixed a typo in the tiny QS code (thanks Volturno)
Version 1.21: 5/11/07
- NFS filtering changes:
- Changed the merge phase to count the average weight
of cycles that would appear in the matrix, not the average
weight of all cycles found. This allows for a somewhat
smaller matrix when there is a large amount of excess
- Changed the filtering driver to iteratively rerun the
singleton removal with gradually smaller bounds. This is
much better at dynamically figuring out a bound for large
ideals that is small and can produce a sensible matrix
- Added back the code that buries ideals that are too
heavy. The current merging architecture makes it possible
to apply burying of ideals more intelligently than my
initial experiments, and burying ideals saves a lot of
memory as merging progresses
- Modified the singleton removal to use a larger hashtable
for the initial disk-based pruning passes, and also to
rebuild the hashtable after a pruning pass if it was
saturated going into the pruning pass
- Increased the maximum clique batch size, to save time
when filtering datasets that have a lot of excess
- Increased the number of excess columns in the final matrix;
maybe this will help prevent occaisional bad behavior
- Merged Brian Gladman's MSVC versions of the inline assembly code,
as well as his rearrangement of the unrolled Lanczos loops
- Merged Brian Gladman's latest MSVC build files
- Modified the NFS square root to handle inputs where there are no
q for which (algebraic polynomial) mod q is irreducible
- Changed the NFS linear algebra to assemble the matrix on disk
and read in the columns after relations are freed. This
reduces the memory use of the linear algebra by about 30%
- Fixed yet another bug in the NFS factor base generator, which
would sometimes miss roots of polynomials that would be
degree 1 after reduction by p (thanks to Hallstein Hansen)
- Documented the QS improvements
- Sharpened the detection of older and newer AMD processors
- Allowed for use of AMD's MMX extensions; this allows a slight QS
speedup when the full SSE instruction set is not available
- Added some QS tuning from Bill Hart
Version 1.20: 5/1/07
- Added support throughout the NFS module for relations with
64-bit 'a' values. This makes the postprocessing a little
slower, but finally allows msieve to complete any previously
started GGNFS run
- QS changes:
- Modified the makefile to create a 'fat binary', the same
core sieving routine compiled over and over again with
different optimizations and preprocessor directives. The
QS driver now selects the optimal routine at runtime,
and this makes several CPUs noticeably faster for small
and moderate size factorizations
- Used two different reciprocal values instead of one,
so that remainder operations involving small factor
base primes are faster now
- Modified the sizing up of the factor base into distinct
regions; this gives ranges of factor base primes that
use reciprocals, and also allows a large range where
remainder operations are essentially free (for primes
smaller than the cutoff for using hashtables)
- Increased the unrolling of the sieve scanning loop to 64
- Used multimedia vector instructions to scan the sieve
interval, if they are available. This makes small and
moderate size factorizations up to 5% faster
- When sizing MPQS polynomials, use the rounded up value
of the sieve interval instead of the original. Not doing
this means the sieve interval would be slightly lopsided,
making sieve values slightly larger on average
- Modified generation of reciprocals so that the quotient
from integer division is exact, i.e. doesn't need correcting
- Reduced the number of small primes that are not sieved
as the input size increases. Skipping small primes is less
important in these cases, and the extra relations found
more than make up for the extra time spent sieving
- Removed obsolete code from MPQS polynomial generation
- Reduced the limit on QS factor base primes to 2^26 (need
an extra bit in the factor base structure)
- Cleaned up check_sieve_val
- Completed documenting the NFS filtering
- Modified the cache detection code to handle level 1 cache sizes too
- Added code to detect the (x86) CPU type
- Modified the NFS square root to handle negative leading rational
polynomial coefficients; also added code to check the sign
of the leading algebraic polynomial coefficient
- Fixed a bug sorting relations during the NFS merge postprocessing
- Fixed find_large_ideals to reduce b values modulo p before
calling mp_modinv_1; the latter cannot handle b that
exceed p (thanks to Greg Childers)
- Fixed a typo parsing NFS options in the demo program
- Increased the allowed relation set size during NFS merging; the
limit cannot be too restrictive, or else many cycles will not
be found (a big cycle can turn into a small cycle because
of a fortuitous merge, but not if the cycle is pruned first)
- Made the prefetching of long buffers conditionally compiled; most
modern CPUs can prefetch these automatically so explicit
prefetch instructions are unnecessary
- Modified the NFS square root to not stop unless the largest
remaining composite is very small. This will protect all
the NFS relations from being overwritten by the MPQS code
- Removed the use of assembly code using cmov instructions when
compiling for 64-bit x86
- Changed inline assembly code to not explicitly clobber the EBX
register; apparently OS X on Intel reserves this register
for the PIC base address (thanks Romain Muguet)
- Increased size of the buffer used in the demo to hold input numbers,
to accomodate ridiculously huge SNFS inputs (thanks Tom Womack)
Version 1.19: 4/17/07
- Linear algebra changes:
- Made the block Lanczos code much more modular
- Added cache blocking to the matrix multiply, with
automatic sizing of blocks based on the size of
CPU caches
- Added x86-specific MMX code to the computationally intensive
routines. This item and the previous reduce the
solve time for large problems by a factor of 4-6!
- Pushed the restart of failed linear algebra runs into
the block Lanczos code itself, instead of relying
on the QS and NFS code to do it
- Added progress reporting for larger jobs
- Increased the number of rows that are converted to packed
format, and made the final number of such rows a
multiple of 32
- Modified the file format of the NFS factor base, to allow manual
adjustment of NFS parameters
- Added an extra NFS matrix row to guarantee that the number of
free relations in each dependency is even
- Fixed a rare buffer overflow in the FFT multiplication code
(thanks to Greg Childers / valgrind)
- Added automatic external cache size detection code
- Began documenting the NFS filtering
- In the NFS linear algebra, increased the bound for primes that
are packed into dense rows; this saves about 5% of the
working set during the linear algebra
- Made all header files safe for C++ compilation
Version 1.18: 4/5/07
- NFS filtering changes:
- Added a postprocessing step to the merge phase that removes
relations by rearranging the basis defined by the current
collection of cycles. This makes the final linear system
about 2-4% lighter
- Made the merge phase a lot more modular, since merging and
postprocessing of the merged relations need many of the
same utility functions
- Added the capability to force building of a matrix, even
when there aren't as many excess relations as the filtering
code wants by default
- Modified the filtering driver to not use the factor base
at all; instead, a histogram of all the primes in the
set of relations is built up during duplicate removal,
and the large prime cutoff is chosen based on that
- Modified the clique removal to choose the clique batch size
based on the amount of excess to prune, instead of using
a hardwired size
- Modified the merge phase to choose the maximum relation
set size to keep based on the amount of excess relations.
This can save a lot of memory when there is a large number
of excess relations
- Modified the merge phase to keep producing cycles as long
as possible, instead of stopping when there are more cycles
than skipped ideals. This gives many more cycles to choose
from, and makes it much easier to converge to a sensible
matrix, especially when there aren't many excess relations
- Modified nfs_read_relation to ignore factors of zero that
are read in (thanks to Greg Childers)
- Fixed a buffer overflow in mp_divrem_core that occurs when the
quotient requires an entire mp_t (thanks sp65536)
- Fixed a longstanding bug in the polynomial rootfinder that
generated incorrect roots for nonmonic linear NFS
polynomials (thanks Macz)
- Fixed the NFS square root to work correctly in the case of a
nonmonic linear polynomial and free relations
- Changed the extended precision floating point to not use
ppc_intrinsics.h on PowerPC. This header apparently does not
exist on the Playstation3 (Cell) dev. environment (thanks Macz)
- Updated the Visual Studio build files, courtesy of Brian Gladman
- Added a little more documentation to the NFS square root
Version 1.17: 3/12/07
- NFS filtering changes:
- Peformed a major overhaul of the merging code. The
new version uses two heaps of ideals, and this allows
undoing decisions made about which ideals to try to
merge. The big advantage of this approach is that merging
starts by computing the lightest possible (but too large)
matrix, then continues reducing the matrix dimension
until the result has a specified density
- Added minimum-spanning-tree combining of relation sets,
and performed a major overhaul of the core merging code
- When merging, keep the relations making up a relation set
in sorted order. This allows the relation lists to be merged
just like the ideal lists, and allows repeated relations to
be removed from cycles as merging progresses
- When estimating the weight gain from combining relation
sets, give a bonus to merging relation sets that lead to
cancellations in the combined list of relations. All of
these improvements combine to reduce the weight of the
final matrix by up to 50%
- Removed more compile-time limits in the merge phase
- Fixed a serious and subtle bug in the merge phase, which
caused the weight of merged relation sets to be wrong
- Changed the second singleton removal pass to use the
same large ideal cutoff for rational and algebraic factor
bases, even if they have different sizes
- Added code to give up on relation sets that are too dense,
i.e. that contain too many relations
Many thanks to Greg Childers for providing an NFS factorization
that made me scramble to improve the NFS filtering
- Added the beginning of a skewed polynomial selector (currently
turned off)
- Performed a complete overhaul of the driver and the manager
for factors found. This pushes the recursive use of
factorization into the library and unifies how all
the different algorithms handle factors found
- Added the Pollard-Brent algorithm; this makes inputs containing
6-8 digit factors complete much faster (thanks to Leonardo
Volpi for pointing out that latter-day msieve versions
performed quite badly in this regard)
- Modified the NFS linear algebra to sort lists of ideals by
prime and then by rational/algebraic type. This assures
that the Lanczos code begins with the sparsest possible matrix
- Added code that brute-force divides read-in NFS relations by
small primes, in case some factors in the savefile
are missing. This is required to perform the postprocessing
for a factorization started with GGNFS. Also simplified
nfs_read_relation() and corrected a minor bug there
- Fixed a horrible use-after-free bug in purge_singletons_final_core()
(thanks valgrind)
- Fixed a bug in the sieving during NFS polynomial selection;
factors of 11 were not being used
- Modified the lanczos code to remember if some of the input matrix
was converted into dense rows; this allows the linear algebra
to restart properly after a bad initial solution
- Added a failsafe path for long division by zero
- Made mp_clear a static inline function instead of a macro. The extra
type checking this allows would have prevented a silly bug
in the NFS square root (in ap_poly_mul)
- Fixed the routine that checks the relation product in the NFS
square root to not assume degree 5 algebraic polynomials
- Fixed a stupid bug in mp_is_prime
Version 1.16: 1/17/07
- Made the code to analyze polynomials independent of the
polynomial degree and skewness; also made the interface
to the analysis code much more modular
- Rearranged the sampling code for NFS polynomial root properties
so that samples are only generated one at a time
- Modified the NFS driver to analyze any polynomial when it is
read in at the beginning of an NFS run, even if the
polynomial was not generated by the library
- Fixed a stupid bug in mp_log that caused NFS polynomial search
to be cut short early
- Fixed a bug in the factor base generator that only shows up
for SNFS factorizations (thanks to Greg Childers)
- Allowed 6th degree polynomials, for people with SNFS factorizations
- Reduced the required amount of NFS matrix excess from 200 to 80
Version 1.15: 1/11/07
- Made as much of the NFS linear algebra code as possible into
common code, so that the QS module can take advantage
of the much more sophisticated matrix handling. Also
generalized the code to form dense rows automatically
- Fixed the linear algebra code that performed post-Lanczos
elimination to get true dependencies; it was completely wrong
- Fixed an obscure bug in the FFT multiply code, that was causing
the NFS square root to fail. Also added a verification
step after the product of the NFS relations is computed
- Modified the initial stage of the NFS linear algebra to
dynamically allocate the structures for forming dense rows
- Made common code to accurately count the number of nonzero
entries in QS or NFS matrices, replacing lots of ad hoc
code that didn't do a good job
- More reductions in the memory use of the NFS square root
- Documented the NFS square root
- Changed the method for detecting the floating point precision
at runtime; the old method doesn't work if the compiler
can emit SSE2 instructions
- Use two separate hash functions when indexing 2-word structures.
This is much more robust against data that is prone to hash