-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add faster Digest#hexfinal
#9292
Conversation
didactic-drunk
commented
May 13, 2020
Why are these methods called |
Also, what's the benchmark source code? |
The name
|
Here. It's a bit of a mess but the unused portions may be of interest when comparing |
Co-authored-by: Sijawusz Pur Rahnama <sija@sija.pl>
Co-authored-by: Sijawusz Pur Rahnama <sija@sija.pl>
Co-authored-by: Sijawusz Pur Rahnama <sija@sija.pl>
It would be nice if these could make it in to the coming release as those getting a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a note about being callable only once and the .dup
workaround to all *final
methods' docs?
Done. |
Soooo..... This didn't get merged and what I predicted would happen is (based on gitter chat).
I suppose This should probably be resolved sooner rather than later. |
Bump. |
Stalled? |
Should I rename |
We discussed it with the team, and decided to push for it. We discussed the name, being |
The My opinion is either:
|
Maybe I should read my own comments (it's been so long)
|
src/digest/digest.cr
Outdated
sary = uninitialized StaticArray(UInt8, 64) | ||
tmp = sary.to_slice[0, digest_size] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering is there some effective limitation on digest_size
that would ensure it's less than 64 bytes? Implementations could technically use whatever size they like. Practical implications might make digests bigger than 64 bytes unlikely, but I presume it would be possible?
If we can't excuse the case for digest_size > 64
with good certainty, there should be an alternative here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so.
We'll have to allocate on the heap when digest_size > sary.size
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe a crazier alternative? In order to avoid an additional heap allocation we could re-purpose the the string buffer which already needs to be allocated for building the string (in Slice#hexstring
).
string_size = digest_size * 2
string = String.new(string_size) do |buffer|
# put the binary data in the back half of the string buffer so values don't get overwritten when the data is transformed to hex characters
tmp = Slice.new(buffer + digest_size, digest_size)
final tmp
tmp.hexstring(buffer)
{string_size, string_size}
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's possible but unlikely. I'd say leave it to those implementations to override #hexfinal
or fix the issue in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes a lot of sense 👍
We should add a note in the docs for digest_size
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output size limitations removed except for hexfinal(io)
src/digest/digest.cr
Outdated
def final(dst : Bytes) : Bytes | ||
check_finished | ||
@finished = true | ||
final_impl dst | ||
dst | ||
end | ||
|
||
# Returns a hexadecimal-encoded digest in a new `String`. | ||
# | ||
# This method can only be called once and raises `FinalizedError` on subsequent calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "can only be called once" is not limited to this method, but includes all finalizing methods. You can only call one of them, not each method once.
We could document that with references to #finalize
in all methods' docs, or references to all methods from all methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See latest "documentation" commit
Co-authored-by: Johannes Müller <straightshoota@gmail.com>
Co-authored-by: Johannes Müller <straightshoota@gmail.com>
src/digest/digest.cr
Outdated
def hexfinal(io : IO) : Nil | ||
sary = uninitialized StaticArray(UInt8, 128) | ||
tmp = sary.to_slice[0, digest_size * 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should address the size restriction in this method. It's not possible to use the same optimization as in the String
overload.
I think it's probably best to raise a meaningful error if digest_size * 2 > 128
and mention this in the documentations (that implementing types can override this method for a larger buffer size).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@didactic-drunk Would you mind to address this? Just adding a proper exception + documentation should be good. I think then it should be ready to merge.
Let me know if you want me to take over.
What kind of familiarity are you addressing? I can't find a In fact, Ruby has |
Auto reset may cause subtle bugs with misuse. Consider: # Incorrect usage may be buried in a Log statement and hard to see
Log.info { "... #{digest.hexdigest!} ..." }
# Incorrect result / data corruption
digest.hexdigest! If you swap the order of Compared with: Log.info { "... #{digest.hexfinal} ..." }
# Raises - error is immediately visible
digest.hexfinal Or this, which appears correct but the order is wrong. This may occur randomly if
With Here's another bug that's easy to create but hard to replicate with auto reset: @@digests = ThreadLocalValue<Digest>... # Or Pool or other class that reuses digests
def fiber_handler
digest = @@digests.get...
digest.update header
digest.update data.decrypt
digest.hexdigest!
end So where's the bug? Ok, so what's the harm in auto reset for code that can't
Fundamentally auto reset assumes the object will be reused. Except it can't be reused reliably with auto reset. Personally I think the Ruby's If you want to solve stale updates by putting # Reset at start of block
digest.open do |d|
# Or
digest.reset do |d|
digest.update ...
digest.final
end
# Or
digest.hexdigest do |d|
d.update ...
end This won't work for classes that hold a reference to There may be other options but please not auto reset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the long, very convincing explanation, and for the patience <3
Digest#hexfinal