-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Conversation
Thank you for contributing this pull request! Here are a few pointers to make sure your submission will be considered for inclusion. The following commiters were not found in the CLA:
You can fix all these things without opening another issue. Please see CONTRIBUTING.md for more information |
I like, where it goes. Thanks for working for it. But I think it should offer a streaming interface, there is a lot of the cases, where it won't be possible or possible only with a significant overhead to fit all input data in memory. I think you could try looking at Thanks again! |
Actually you don't have to fit all input data in memory, since both Adler-32 and CRC-32 can be computed incrementally. So you could do something like this: var crc = 0; // not sure about this part, see below
someStream.on('data', function(data) {
crc = zlib.crc32(data, crc);
});
someStream.on('end', function() {
doSomethingWith(crc);
}); One thing I'm not sure about is how to deal with the initial checksum value. The zlib docs don't mention that it's 0 for CRC-32 (and 1 for Adler-32) and instead tell you to pass Z_NULL as the buffer to get the initial value. So maybe we should allow calling |
I got you, could we still try to produce a Stream API in addition to one that you are proposing? |
Well, if we have a stream API then there isn't any use in supporting incremental checksum for the non-stream API, is there? So maybe we could have the following two pairs:
Does that look good to you? |
It wouldn't hurt to have optional argument for incremental result. |
Hmm, what should the stream output? The checksum value as a 4-byte buffer? If so, in which endianness? |
@seishun I think it doesn't make sense to make this |
@vkurchatkin crypto hashes are Transform streams and their output isn't exactly a stream either. Currently this outputs the checksum as a Buffer in little-endian, but perhaps this should be made configurable. I still think that supporting incremental result for the non-stream API is a bad idea for the following reasons:
|
@seishun true... but doesn't make sense either)
Maybe it's better to have result as a number (so, the stream can't be readable) and let the user write it to whatever buffer they want. |
I'm ok with making it a Transform stream, which will emit the result only when |
@indutny any comments on my last commit? |
@seishun I wonder if it could be an readable-object-stream emitting a plain number as the |
It seems you can't have a stream that's in object mode for reading and non-object mode for writing, at least according to the current docs. If you mean a writable stream with a digest() method, like @vkurchatkin suggested, then I could do that for the time being, until @tjfontaine and @TooTallNate wake up. Do you still insist on supporting incremental checksum for the |
@seishun I'd rather wait for @tjfontaine or @TooTallNate for streams thing. Initial value could be implicitly used in C++-land when no argument is passed, does it sounds right for you? |
I think there should be a way to get the initial value in JS as well, for example, if you want to var crc = get_initial_value_somehow;
buffers.forEach(function(buf) {
crc = zlib.crc32(buf, crc);
}); It will be especially important if we decide to implement the Stream API in JS-land using the non-stream functions. Perhaps I should do some benchmarks to see if such an implementation would be significantly slower than the current one. |
Ok, I guess you could export it in a |
It seems that module is internal to node, we need something easily accessible to the user. |
|
Oh, indeed. Better use all-caps name, though. |
That seems pretty verbose though, I get the feeling a lot of people would just 0 instead. Is there something wrong with |
I think |
That's how we do it in the rest of core, I think. I can't remember any method that will return initial value when called without arguments. |
Is there any other method where an initial value is needed? |
Not exactly initial value, but yeah: Anyway, I object a need to call method instead of a static explicit constant. |
@vkurchatkin yeah, I am aware of it. |
@indutny these are not quite the same. Additionally, both Also, getting back to my point about verbosity. If we force people to use a long and hard-to-type constant, they are more likely to just use 0 directly. Someone reading that code would either need to spend time to figure out where that magic value is coming from, or make incorrect assumptions (for example, that it's 0 for Adler32 too). We shouldn't make it harder to write good code. Or we could avoid this problem altogether by not supporting incremental checksum for non-stream methods, since I don't see any use-case for that where it wouldn't be simpler to just use a stream. @vkurchatkin yeah I guess it would help in this case. |
@seishun ok, let's use occam's razor on this. Streams for incremental, methods for one-chunk checksums. |
Good to have reached a consensus. Once we figure out how to deal with the streams, I can move the methods to JS-land to simply wrap the streams, reducing amount of code. |
Since there hasn't been any input on this, I proceeded with the API that we seemed to have agreed on for now. The streams are now object mode for reading and emit a number, and the functions are implemented via the streams to remove them from C++ land. |
this._binding = new binding.CRC32(); | ||
Transform.call(this); | ||
this._writableState.objectMode = false; | ||
this._readableState.objectMode = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can now use the new partial objectMode initialization
Fixed the objectMode thing. |
Ok, sorry to bounce you around on this, but this really needs to live in the crypto module and behave the way our existing hash implementations work. Your interface pretty much already matches that, we just need to move it to lib/crypto.js instead of lib/zlib.js the binding can stay in the zlib.cc |
Are you sure about it? Moving checksums to crypto would be extremely unintuitive and misleading since they are not provided by OpenSSL (like everything else in crypto is) and should not be used for any cryptographic purposes. |
I would vote that moving it to crypto would not be that ugly. For anyone who does not know zlib, it's not intuitive that checksum functions would be in zlib module. Crypto would be my first guess. Additionally, other checksum and digest API's should get added over time to node core and these would not get added to zlib. It would be unfortunate to have checksum and digest algorithms split across different modules. Just my 10c. |
@JavaScriptDude CRC32 is a hash function but it is not a cryptographic hash by any means: http://en.wikipedia.org/wiki/Cyclic_redundancy_check#CRCs_and_data_integrity |
it's not a matter of it being cryptographic, but the fact that is a hash means it should be colocated with our other hashes and behave in the same way and use the same facilities -- the fact that it is implemented by zlib is an artifact of implementation |
Well it would still be a separate class from Hash. And it can't behave the same way as Hash, since crypto hashes return a sequence of bytes, while the zlib checksums return an integer. @indutny what's your take on this? |
I'd like things to belong where they come from. If we would ever decide to remove zlib (which won't really happen), both things will go away together :P |
I was just raising the point because it may mislead people. |
Rebased and replaced _binding with _handle. |
Our backwards compatibility guarantees basically mean that since we've provided a CRC32 is a hash that works in the same way as the other hashes with |
@indutny @chrisdickinson @trevnorris ... unfortunately this one seems to have stalled out. Any further thoughts on this one? |
Sure would be great to see some cpu efficient checksum tools in the node.js base for those non-crypto usecases like file change checking algorithms. |
@JavaScriptDude ... I agree. At this point tho, I'm just trying to get a sense of where it should land. It's probably not something we'd add to v0.12 at this point so I'm thinking this should be closed here and reopened against the active development stream at http://github.com/nodejs/node |
Going to close this here. The PR would never be able to land in it's current state and an updated PR opened against http://github.com/nodejs/node master would be needed. That's not to comment on the validity of this. |
For the record, I have added a new issue to open a discussion to Incorporate suite of hash utilities into node #2444 |
Continuing from #7213.
I couldn't think of a way to make a streaming interface without over-complicating the API, and it seems like something that could be easily implemented by a user-land library. But if someone can do some benchmarking and show that marshaling the running checksum between JS and C++ causes a significant performance loss, it will be a good reason to reconsider.
The proposed API is pretty straightforward. If it LGTY, I'll add docs and tests.