-
Notifications
You must be signed in to change notification settings - Fork 22
/
CHANGES
457 lines (295 loc) · 14.1 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
5.4.0
- New structures for simple selection on LongBigArrayBitVector instances.
- Bulk select methods are now part of the interface (with default
implementations).
- Replaced LongArrayBitVector.word() with LongBigArrayBitVector.word()
in EliasFanoMonotoneBigLongBigList. Previously, big arrays of lower
bits with more than 2^31 elements would have caused an assertion
failure or an out-of-bounds exception.
- To make EliasFanoMonotoneBigLongBigList completely free from
limitations, also its upper bits now are implemented using a
LongBigArrayBitVector. This change required to bump the serial version
UID. Also MappedEliasFanoMonotoneLongBigList had to be adapted, and
its UID bumped.
- Several problem related to "int vs. long" in the new classes were fixed.
5.3.1
- New EliasFanoMonotoneBigLongBigList which uses a LongBigArrayBitVector
for the lower bits and avoids the limitations of
EliasFanoMonotoneLongBigList. Instances can be mapped in memory with
MappedEliasFanoMonotoneLongBigList.
5.3.0
- New MappedEliasFanoMonotoneBigList class makes it possible to
map into memory the lower bits of an Elias-Fano list.
- Completely revamped ZFastTrie implementation.
5.2.3
- Thanks to FileLines*Iterable, all classes can read term lists using
arbitrary decompressors.
- All non-deprecated function classes now support byte arrays.
- Official support starts now at Java 9. By mistake, even if compiled
for Java 8, Sux4J was using Math.multiplyHigh(), which appears only
in the JDK library since Java 9.
5.2.2
- Sux4J is now dually licensed under the Lesser GNU Public License 2.1+ or
the Apache Software License 2.0.
5.2.1
- Removed (almost) unused dependencies.
- Upgraded to newest versions of Apache Commons.
5.2.0
- Massive (unfortunately, sometimes not backward-compatible) overhauling
of the interface specification and of the implementation to reduce the
amount of parameter checks and special cases.
- The default size of the bucket at creation in a BucketedHashStore has
been raised to 256 (it used to be one) to avoid that calls to
checkAndRetry() on a newly created store are too expensive.
- BucketedHashStore now return the bucket index as a long (this is
unfortunately not backward-compatible for some applications). This
change was necessary as buckets can be more than Integer.MAX_VALUE.
All user classes now throw an exception if the number of keys is
so large than the resulting number of buckets would be greater than
Integer.MAX_VALUE.
5.1.5
- Fast forward iterations in
EliasFanoMonotoneLongBigList/EliasFanoPrefixSumBigList and skipping by
value when iterating over an EliasFanoIndexedMonotoneLongBigList.
5.1.4
- Several new methods for EliasFanoIndexedMonotoneLongBigList.
5.1.3
- Released to fix dependency on broken dsiutils 2.1.13.
5.1.2
- New indexed version of EliasFanoMonotoneLongBigList.
- OSGi metadata and default modularization.
- Manual reduction in strength in arithmetic operations everywhere.
5.1.1
- Added automatic module name.
5.1.0
- GOV/GV functions now allow arbitrary data sizes, thanks to the
new LongBigArrayBitVector class in the DSI utilities.
- Fixed SLF4J dependencies (again).
5.0.4
- We now use the classifier in naming artifacts.
5.0.3
- Fixed build.xml: moved to Java 9, more explicit dependency naming.
5.0.2
- Fixed old bug in SimpleSelect. Thanks to Richard Webb for reporting
this bug.
5.0.1
- New EliasFanoMonotoneLongBigList16 class supporting very large
lists with fixed lower-bits width 16 (bypassing the LongArrayBitVector
128Gib limit).
5.0.0
- Major revamp of the it.unimi.dsi.sux4j.mph package.
4.3.0
- Several fixes and improvements.
4.2.0
- Java 8-only.
- New compressed functions (GV3CompressedFunction and
GV4CompressedFunction) which store a mapping key/value using a number of
bits per key close to the empirical entropy of the list of values.
- Parallel, multithreaded construction for GOV3Function, GOV4Function,
GOVMinimalPerfectHashFunction, GV3CompressedFunction and GV4CompressedFunction).
4.1.0
- We now use in all the new structures of the mph package a modulo-free
range reduction based on a integer-simulated floating point computation
(see http://www.drdobbs.com/tools/fast-high-quality-parallel-random-number/231000484).
This has improved lookup performance by 5-10% but it has made necessary
to bump the serial version UIDs.
- Fixed wrong mask in Rank12. It could lead to unpredictable
errors.
4.0.0
- Complete revamp of the mph package. New algorithms provide
better space and faster lookup. Please read carefully the
package documentation.
- Older maps such as MWHCFunction, MinimalPerfectHashFunction
and TwoStepsMWHCFunction have been deprecated, and will be
removed sooner or later.
- All serial version UIDs have been updated.
- We now use SpookyHash.
- We now support, for all functions for which it is possible,
raw (i.e., non-lexicographical) transformation strategies
(as they are faster).
- Fixed a lot of documentations, main methods, etc.
- Several functions can now be built from the command line
using byte-arrays as keys; strings are read without any
decoding.
3.2.2
- Improved speed for selection in a word thanks to an idea of Giuseppe
Ottaviano. Now we refer to the implementation found in the DSI
utilities (it.unimi.dsi.bits.Fast.select()).
3.2.1
- Fixed dependencies.
3.2.0
- New Rank11 and Rank12 implementations with new space/speed tradeoffs.
Taken from Simon Gog and Mathias Petri's paper.
- New ranking structure for MinimalPerfectHashFunction based on Rank11,
which made it necessary to bump the serialVersionUID.
- New HashDisplaceCompressMinimalPerfectHashFunction class implementing a
new technique by Belazzougui, Botelho and Dietzfelbinger, providing
minimal perfect hashing at 2.05 bits per key. Construction time is an
order of magnitude larger than MinimalPerfectHashFunction, and queries
are ~50% slower, so we will keep the other implementation as a reference
for the time being.
3.1.2
- Revised adapter (now renamed SignedFunctionStringMap) to accept
generic Object2Long<? extends CharSequence> functions, discovering
dynamically whether they implement Size64.
3.1.1
- New thin adapter SignedObject2LongFunctionStringMap to adapt the new
Sux4J self-signing functions to the StringMap (big) interface. The
adapter implements also a constructor based on an Iterable that
can be used directly with MG4J's IndexBuilder.
- New bulk get() methods for EliasFanoLongBigList,
EliasFanoMonotoneLongBigList and select() for SimpleSelect. They are
built one upon each other and they are an order of magnitude faster than
picking consecutive elements using getLong()/select().
3.1.0
- MinimalPerfectHashFunction, MWHCFunction,
LcpMonotoneMinimalPerfectHashFunction,
TwoStepsLcpMonotoneMinimalPerfectHashFunction and
ZFastTrieDistributorMonotoneMinimalPerfectHashFunction have now built-in
signing. The effort was relatively small, and the gain in speed is huge,
as we can use part of the hash computed by the ChunkedHashStore as a
signature.
- New "dictionary" option in MWHCFunction which stores the signature as a
value. It gives the fastest and most compact probabilistic dictionary.
- Big switch to builders for MinimalPerfectHashFunction, MWHCFunction,
TwoStepsMWHCFunction, LcpMonotoneMinimalPerfectHashFunction,
TwoStepsLcpMonotoneMinimalPerfectHashFunction and
ZFastTrieDistributorMonotoneMinimalPerfectHashFunction. As a final goal,
all multiple constructors will be eliminated in favour of Guava-style static
nested Builder classes.
- Major speed improvements in
ZFastTrieDistributorMonotoneMinimalPerfectHashFunction.
3.0.11
- Now we use XorShift1024StarRandom everywhere.
3.0.10
- New parameter to MWHCFunction makes it possible to build arbitrary
functions from the command line.
3.0.9
- Added for efficiency a lowerBitsMask field to
EliasFanoMonotoneLongBigList, which made it necessary to bump the
serialVersionUID.
- Aligned the behaviour of EliasFanoMonotoneLongBigList with that of
MG4J's implementation--the upper bound is not necessarily strict.
- Fixed subtle bug in ChunkedHashStore that could cause an out-of-bounds
access to an array in the (very rare) case of a retry.
- The documentation of Jenkins hashes was still stating, erroneously,
that we return the first computed value (we return the third one).
- Deprecated length() methods have been removed.
- Added UTF-32 support to all classes.
3.0.8
- Significantly improved speed of SimpleSelect and all related
classes (Elias-Fano lists, etc.).
- Improved speed of MinimalPerfectHashFunction.
- Integer overflow bugs fixes (thanks to Tim Potter).
3.0.7
- Some monotone hash functions would crash during construction due to an
integer overflow. Thanks to Tim Potter for reporting this bug.
3.0.6
- Switched to SLF4J for logging.
3.0.5
- The Ivy dependency file has been Maven-normalized (now we have a
default "compile" scope).
3.0.3
- Replaced methods from it.unimi.dsi.bits.Fast with equivalent methods in
Integer/Long, as recent JVMs intrinsify such methods.
3.0.2
- Fixed minor inconsistencies in the values returned on empty functions
(some implementations would actually throw an exception). Thanks to
Valentin Tablan for reporting the problem.
3.0
- WARNING: This release has minor binary incompatibilities with previous
releases, mainly due to the move from the interface
it.unimi.dsi.util.LongBigList to the now standard
it.unimi.dsi.fasutil.longs.LongBigList. It is part of a parallel release
of fastutil, the DSI Utilities, Sux4J, MG4J, WebGraph, etc. that were
all modified to fit the new interface, and that prepare the way for our
"big" versions, that is, supporting >2^31 entries in arrays (simulated),
elements in lists, terms, documents, nodes, etc. Please read our (short)
"Moving Java to Big Data" document (JavaBig.pdf) for details.
- We now require Java 6.
- it.unimi.dsi.util.LongBigList is dead. Long live to
it.unimi.dsi.fastutil.longs.LongBigList. We're sorry for the
nuisance--adapting the code should be very easy (and we warned you
anyway :).
- MWHCFunction has been re-engineered so that it will use very little
space beside that actually required by the function. Previously, a
number of large bit vectors were allocated at the same time, and they
have been replaced by a judicious use of OfflineIterable. All classes
that use MWHCFunction will benefit (at the expense of slightly increased
disk usage and access).
- All classes should support big collections. They use the new Size64
interface from fastutil. Implementations still support the old
(deprecated) length() method for backward compatibility.
- New FileLinesBigList class.
- We now have a MurmurHash3 full implementation.
2.0.1
- Sux4J is now distributed under the GNU Lesser General Public
License 3.
- Major rewriting of the hypergraph peeling code. Now we use less memory
and we are definitely faster.
- MWHCFunction and MinimalPerfectHashFunction accept a temporary directory
for the chunked hash store files.
- Improved ChunkedHashStore architecture that allow arbitrary values,
so functions can be built without keeping values in memory.
- Fixed bug in MinimalPerfetHashFunction that was causing exceptions
when using more than a billion keys (thanks to Wei Liu for reporting
and fixing this bug).
- Fixed bug in some rank/select classes that was causing integer overflow
errors when building structure over bit vectors with >2Gi bits.
- AbstractLongBitVector.equals() now uses getLong() on word boundaries.
- Fixed pernicious bug in Select9.
2.0
- General revamp, restructuring, improvements, new coherent names for
classes. Most of the code has been rewritten or improved.
- New (partial) structures for balanced parentheses.
- New build system based on ChunkedHashStore that works for billions
of keys.
- New structures for balanced parentheses (partially implemented).
- Faster Select9 operation: some broadword operations were implemented in
a redundant way.
- SparseSelect/SparseRank would give an incorrect (or at least little
useful) value for numBits() when using shared data.
- Fixed problem with SparseSelect: some methods inherited from
EliasFanoMonotoneBigList were causing exceptions because the size of the
SparseSelect is the number of bits, not the number of ones.
- New hollow trie implementation based on balanced parentheses.
- Fixed an old and severe bug in MinimalPerfectHashFunction, that
was causing the generated functions not to be perfect.
1.0.4
- New progressive hash-computation methods that provide Jenkins hash
in constant time on all prefixes (after a linear-time preprocessing).
1.0.3
- New TwoStepsMWHCFunction that records in s<r bits the most
frequent values.
- Much improved relative trie implementation (uses two-steps MWHC functions, too).
- New TwoStepsLCPMonotoneMinimalPerfectHashFunction: it is slightly slow
than an LCPMonotoneMinimalPerfectHashFunction, but consumes significantly
less space thanks to a two-steps MWHC function.
1.0.2
- Fixed bug in SimpleSelectZero: under certain conditions, an error in
counting zeroes was causing a out-of-bounds array access.
- Fixed gross bug in MWHCFunction. It wasn't working for bit width
beyond 32.
- New data structure (RelativeTrieMonotoneMinimalPerfectHashFunction).
1.0.1
- Fixed wrong name of PaCo-trie-based monotone minimal
perfect hash functions.
- Fixed small misbehaviours on very small key sets.
- New data structure (HollowTrieMonotoneMinimalPerfectHashFunction)
that uses just 3.44 + 1.23 log log l bits per element.
1.0
- Fixed Jenkins hash so that it works with empty strings.
- Some class renamed.
- Rethought command-line argument parsing.
- Now monotone minimal perfect hashing classes.
- ShiftAddXorSignedStringMap moved to the DSI utilities.
- New classes for compressed lists based on the Elias-Fano
representation.
0.3
- New rank/select structures.
- Restructuring following release of the DSI utilities: bit vectors
implementations have been moved there.
- New signing classes that implement StringMap and make it possible to
mimick the old MG4J SignedMinimalPerfectHash behaviour.
0.1
- First release.