This repository has been archived by the owner on Dec 7, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 5
/
TODO
104 lines (94 loc) · 4.78 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
small_iters problem: in-place and buffered codecs that overwrite their input (all but checksums) encode the given data only during the 1st small_iter. Possible solutions:
* drop si; instead of N si have N copies of data stored in RAM
* forbid si>1 for codecs where it doesn't work
* add memcpy where it's needed (very bad for checksums because they are too fast)
* (partial, but good solution) make input buffer the same as other ones; a series of blocks.
* if the 1st codec is moving, start with it and then keep iterating between workbufs
* after it's done, make sure that inbuf contents haven't changed
update LZWC (requires reiplementation of patches, it changed so much)
get rid of iostream.
add the AES implementation used by vmac/umac
add mmini-lzh
add http://code.google.com/p/variablelengthstringhashing/
* with a fast PRNG (xorshift?)
* with a fast crypto PRNG
* with a static precomputed random sequence
add:
* https://github.com/zfy0701/Parallel-LZ77
* https://github.com/matuba/LZSS
* https://github.com/Wuestenschiff/myLz77
* https://github.com/m3h/lz77
* https://github.com/Cereal84/LZ77/tree/master/source
* https://github.com/ky1000ky2000/lz77/tree/master/lz77
* https://github.com/stevesan/lz77
* http://encode.ru/threads/1562-Urban-Compressor
* http://encode.ru/threads/1619-TinyLZP-A-very-simple-LZP-compressor
* http://mattmahoney.net/dc/text.html#3062
* reintroduce LZ4bz
* (!!!!!!!!!!!!) http://bench.cr.yp.to/supercop.html (!!!!!!!!!!!!)
* http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html -- sshzlib.c
* http://www.mikekohn.net/file_formats/kunzip.php
* http://sourceforge.net/projects/gziplite/
* http://plan9.bell-labs.com/sources/plan9/sys/src/libflate/
* SynLZ: http://synopse.info/fossil/dir?ci=tip
* http://freecode.com/projects/compress
* http://code.google.com/p/arx/source/browse/
* https://code.google.com/p/fast-hash/
* https://code.google.com/p/ngramhashing/
* https://github.com/floodyberry/poly1305-donna
* https://code.google.com/p/crcutil/
* http://cessu.blogspot.com/2008/11/hashing-with-sse2-revisited-or-my-hash.html
* http://encode.ru/threads/1874-Alba (seems poor)
* entropy coders, http://encode.ru/threads/1890-Benchmarking-Entropy-Coders lists some
* https://github.com/Bulat-Ziganshin/DataSmoke
* BWT(s) variants
* http://www.cs.amherst.edu/~sfkaplan/research/compressed-caching/WK4x4.tgz
* http://www.cs.amherst.edu/~sfkaplan/research/compressed-caching/WKdm.tgz
* WSL: http://ssergeo.narod.ru/
* http://mattmahoney.net/dc/bpe2v3.cpp
* RLC: https://061c9e27-a-62cb3a1a-s-sites.googlegroups.com/site/bindeshyouneedme/RLC.zip?attachauth=ANoY7cqQOLgsVAXDyUMRJzdA-QTf-fzxScDfY2z1XFVdH1_Lz0KBIt3RdWj6XmEkeJqH7y0GdoppvDaJVK0mhufMhVc7vjRcXr1GGQv11iATd_zaA61JkdxlSaYs2yhaTvDFGy06V9t4qcpWKN9REqqkIwiiqWPcosks6Cis7rR4fe-t946C6JK5UzVdlfc2Q59Bo8DdB-nfl9TqWf_RUoQ-hQ6aYbgLqQ%3D%3D&attredirects=0
* https://sites.google.com/site/compgt/lz77
* http://mattmahoney.net/dc/lazy100.zip
* http://encode.ru/threads/1874
* https://github.com/valdmann/crook
* http://jveness.info/software/cts-v1.zip
* https://github.com/coderforlife/ms-compress
* http://encode.ru/threads/2059-phafre
* http://encode.ru/threads/1909-Tree-alpha-v0-1-download/
* https://www.schneier.com/blog/archives/2014/10/spritz_a_new_rc.html
* http://eprint.iacr.org/2013/404.pdf
* siphash implementations
* http://encode.ru/threads/2102-Adler32-on-Large-Blocks
* http://balz.sourceforge.net/
* https://github.com/lemire/FastPFOR
* https://github.com/powturbo/TurboPFor
* http://sourceforge.net/projects/wimlib/
Allow more than 1 option passed to a codec, i.e. tor:10:ht4
Don't use a fixed number of iters; watch variance and keep repeating until confidence is good enough.
Allow testing on misaligned buffers
Allow testing of decompressors only (for example for systems that don't have enough resources to compress)
Streaming coding: for codecs that enable to encode / decode 1 piece at the time, enable pipelining such codecs, so data always stays in Lx cache
Rewrite command line parsing. https://github.com/rofl0r/gettext-tiny ?
Is VHASH handing OK?
Sample from Bulat:
void VHASH_128 ( const void * key, int len, unsigned long seed, void * out )
{
ALIGN(4) unsigned char mykey[] = "abcdefghijklmnop";
ALIGN(16) static vmac_ctx_t ctx;
ALIGN(16) static unsigned char buf[(8<<20)+16];
static int inited = 0;
uint64_t res, tagl;
if (!inited) {vmac_set_key(mykey, &ctx); inited = 1;}
if (len>=sizeof(buf)-16) {printf("\n%d bytes to vhash!\n",len); abort();}
memcpy(buf,key,len);
memset(buf+len,0,16);
res = vhash(buf, len, &tagl, &ctx);
((uint64_t*)out)[0] = res;
#if (VMAC_TAG_LEN == 128)
((uint64_t*)out)[1] = tagl;
#endif
}
VHASH can be faster, according to Bulat, all that's needed is a define:
#define VMAC_NHBYTES 4096
pg_lz: endianness detection, now it's OK with LE only
endianness detection framework