Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.crypto.hash.sha2: cleanup add add more docs #19744

Merged
merged 8 commits into from
Apr 28, 2024

Conversation

clickingbuttons
Copy link
Contributor

@clickingbuttons clickingbuttons commented Apr 22, 2024

Some cleanup I made during changes in #19308 before #19697. This is no longer necessary for that branch, but it does allow arbitrary SHA512 digests and improves readability.

Replace `Sha512224`, `Sha512256`, and `Sha512T224` with
`fn Sha512Truncated(digest_bits: comptime_int)`.

This required refactoring `Sha2x64(comptime params)` to
`Sha2x64(comptime iv: [8]u64, digest_bits: comptime_int)`
for user-specified `digest_bits`.

I left ziglang#19697 alone but added a compile-time check that digest_bits is
divisible by 8.

Remove docs which restate type name. Add module docs and reference where
IVs come from.
@nektro
Copy link
Contributor

nektro commented Apr 22, 2024

looks good so long as the test pass, but the Sha512_224 and Sha512_256 decls should be pub

@clickingbuttons
Copy link
Contributor Author

I've made them public and updated the OP.

@jedisct1
Copy link
Contributor

Sha512T256 -> deleted (not in spec, not sure original intent)

Do not do that. SHA512 truncated to 256 bits is widely used, especially with anything using NaCl, libsodium and derivatives.
One of the motivation was probably that SHA512/256 (with the custom IV) is way less commonly implemented than regular SHA512. This also allows for code reuse.

To be honest, I don't like these changes at all. They are misleading.

SHA-512/256 is not SHA-512 truncated to 256 bits. It shouldn't be renamed to Sha512Truncated(). These are different functions, and both have real-world usage.

We probably shouldn't make the truncation a parameter either, and stick to what exists. IVs for NIST variants are only defined for a small set, so we can't invent them for arbitrary output sizes.

@jedisct1
Copy link
Contributor

jedisct1 commented Apr 22, 2024

Cleanups are great (check for possible performance changes, though), but if introducingSha512Truncated() is necessary, that function shouldn't be called like that nor be public.

Maybe we should also try to keep SHA-256 and SHA-512 similar. Sha512Truncated() was introduced, but not Sha256Truncated() which feels a bit weird.

Sha256T192 could be added. That one was defined by NIST, and is an actual truncation, not using a specific IV.

Names such as Sha256224 are indeed weird. That was to follow the convention that types are PascalCase and never include underscores anywhere else. But yeah, Sha256_224 is far more readable, so if we can make an exception here, it's worth it.

@clickingbuttons
Copy link
Contributor Author

clickingbuttons commented Apr 23, 2024

SHA512 truncated to 256 bits is widely used...

I did not know that. That makes sense because the SHA-512/t variants were standardized 11 years after SHA-512... yikes!

We probably shouldn't make the truncation a parameter either, and stick to what exists. IVs for NIST variants are only defined for a small set, so we can't invent them for arbitrary output sizes.

Sadly it's a parameter in section 5.3.6 of FIPS 180 for arbitrary output sizes where 1 < digest_len < 512 and digest_len != 384. I've implemented it in fn sha512iv.

Whether or not it's a commonly used parameter for sizes other than 224 and 256 is another question. While I can personally think of uses for other sizes, I'll defer to your experience.

Names such as Sha256224 are indeed weird.

I agree. Not only is it hard to read the 256 separate from the 224, but the truncation strategy is completely ambiguous. If you agree to arbitrary digest_bits, I propose this:

pub const Sha224 = Sha2x32(iv224, 224);
pub const Sha256 = Sha2x32(iv256, 256);
pub const Sha384 = Sha2x64(iv384, 384);
pub const Sha512 = Sha2x64(iv512, 512);

/// This may be more performant than SHA224 and SHA256.
pub const IvStrategy = enum {
    /// Change the IV per NIST FIPS 180 before truncation.
    /// These are for the official SHA-512/t types published in 2011.
    change,
    /// Use the original IV and truncate the end.
    /// Commonly used in implementations before 2011.
    keep,
};

/// SHA-512 truncated to length `digest_bits`.
/// This may be more performant than SHA224 and SHA256.
pub fn Sha512Truncated(digest_bits: comptime_int, comptime iv_strategy: IvStrategy) type {
    const iv = switch (iv_strategy) {
        .change => sha512iv(digest_bits),
        .keep => iv256,
    };
    return Sha2x64(iv, digest_bits);
}

This will force truncated SHA2 users to carefully select their truncation strategy instead of flipping a coin and being frustrated when it doesn't match some other implementation.

Sha256T192 could be added. That one was defined by NIST, and is an actual truncation, not using a specific IV.

You really know the history! Since that one is unambiguous it's easy to define:

/// Truncated to 192 bytes per FIPS 800-208.
pub const Sha256_192 = Sha2x32(iv256, 192);

@jedisct1
Copy link
Contributor

jedisct1 commented Apr 23, 2024

I've never seen any actual protocol or application where other sizes are used. We should avoid allowing insecure parameters, or really uncommon parameters that will cause interoperability problems.

IvStrategy and Sha512Truncated are internal implementation details. Making them public would likely end like bloat that no one is going to use but that we will have to maintain forever.

If someone really needs something weird such as a 480 bit output with SHA512, it's already possible without additional types and functions, by explicitly truncating the SHA512 result. Being explicit in such weird cases is not a bad thing.

So, I'd suggest keeping the interface simple and only exposing types for constructions that are actually used: 224, 256, 384, 512, 256_224, 256T192, 512_224, 512_256, 512T256.

That's already a ton of choices. Adding details about IV strategy would be even more confusing.

@clickingbuttons
Copy link
Contributor Author

clickingbuttons commented Apr 23, 2024

I've never seen any actual protocol or application where other sizes are used.

I can come up with some creative bit-packing use cases, but I defer to you.

If someone really needs something weird such as SHA512/480, it's already possible without additional types and functions, by explicitly truncating the SHA512 result.

Yes, by reimplementing truncation to the proper [60]u8 type. However, the FIPS version with a changed IV is NOT currently possible.

For either case, an author would probably find it easier to modify sha2.zig themselves to change the IV and digest length...

That's already a ton of choices. Adding details about IV strategy would be even more confusing.

The confusion exists either way between the 512_256 and 512T256 types, and the user MUST deal with that thanks to NIST.

If there was standardized notation I wouldn't argue further, but I've seen SHA-512/t (NIST), SHA512-t (openssl), and 512Tt (Zig).

It's just a question of how to expose the complexity. I'd prefer a generic type and to trust the user with security but I understand why you'd prefer instantiated common variants and to not trust the user.

I've amended this PR to do the latter. Please rereview. Benchmarks coming soon.

@clickingbuttons
Copy link
Contributor Author

clickingbuttons commented Apr 23, 2024

Performance difference is within margin of error. Looks like rolling those 2 loops changed nothing but readability 🎉 .

~/src/poop(main)$ zig version
0.12.0-dev.3673+1fb238131
~/src/poop(main)$ zig build-exe ./sha.zig -O ReleaseFast -femit-bin=new --zig-lib-dir ../zig/lib
~/src/poop(main)$ zig build-exe ./sha.zig -O ReleaseFast -femit-bin=old
~/src/poop(main)$ ./zig-out/bin/poop ./old ./new
Benchmark 1 (323 runs): ./old
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          15.4ms ±  317us    14.9ms … 17.2ms          9 ( 3%)        0%
  peak_rss              0   ±    0         0   …    0            0 ( 0%)        0%
  cpu_cycles         79.6M  ±  828K     79.1M  … 88.2M          39 (12%)        0%
  instructions        340M  ± 2.63       340M  …  340M          34 (11%)        0%
  cache_references   5.02K  ± 25.8K     1.29K  …  460K          34 (11%)        0%
  cache_misses        428   ±  106       181   … 1.72K          39 (12%)        0%
  branch_misses       310   ± 27.7       222   …  528            7 ( 2%)        0%
Benchmark 2 (324 runs): ./new
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          15.4ms ±  282us    14.9ms … 16.6ms         13 ( 4%)          +  0.2% ±  0.3%
  peak_rss              0   ±    0         0   …    0            0 ( 0%)          -  nan% ± -nan%
  cpu_cycles         79.6M  ±  669K     79.2M  … 84.2M          47 (15%)          -  0.0% ±  0.1%
  instructions        340M  ± 2.66       340M  …  340M          34 (10%)          +  0.0% ±  0.0%
  cache_references   4.20K  ± 9.46K     1.28K  …  143K          40 (12%)          - 16.4% ± 59.7%
  cache_misses        439   ±  101       258   … 1.07K          48 (15%)          +  2.8% ±  3.7%
  branch_misses       309   ± 26.4       214   …  476           16 ( 5%)          -  0.3% ±  1.3%

Edit: Here's sha.zig

@jedisct1
Copy link
Contributor

These inconsistent notations are annoying, but specifications would always refer to these as SHA-x/t or SHA-x-t, and are unlikely to explain anything about IVs. We shouldn't expose parameters when use cases don't exist.

Go's standard library only exposes SHA224, SHA256, SHA384, SHA512, SHA512_224, SHA512_256 because that's enough to implement all standard protocols. JavaScript and Python even get away with SHA512_224 and SHA512_256.

There's a gap between NIST documents, especially ancient ones, and what's used in the industry. If we wanted to properly support these documents, we should even support lengths with a bit granularity. Since then, NIST adopted SHAKE for hashing with variable-length output, so even in a pure NIST world, custom SHA2 truncation is going to stay a funny thing from an old specification.

@jedisct1
Copy link
Contributor

Performance difference is within margin of error. Looks like rolling those 2 loops changed nothing but readability 🎉 .

Great!

lib/std/crypto/sha2.zig Outdated Show resolved Hide resolved
lib/std/crypto/sha2.zig Outdated Show resolved Hide resolved
lib/std/crypto/sha2.zig Outdated Show resolved Hide resolved
lib/std/crypto/sha2.zig Outdated Show resolved Hide resolved
lib/std/crypto/sha2.zig Outdated Show resolved Hide resolved
@clickingbuttons
Copy link
Contributor Author

clickingbuttons commented Apr 23, 2024

I loved your doc comment so much I made it into a namespace.

/// Truncating the output of SHA2-based hash functions improves security by mitigating...
pub const truncated = struct {
    pub const Sha256T192 = Sha2x32(iv256, 256);
    ...
};

Happy to remove if you dislike it, I just think it should be featured in the docs.

lib/std/crypto/sha2.zig Outdated Show resolved Hide resolved
lib/std/crypto/sha2.zig Outdated Show resolved Hide resolved
@jedisct1
Copy link
Contributor

I loved your doc comment so much I made it into a namespace.

SHA-384 et al. are also truncated, so they should go there. But it may make them harder to find.

Maybe we could just mention that in the sha2 namespace comment.

@jedisct1
Copy link
Contributor

Or mention in the documentation of SHA-256 and SHA-512 that unlike their truncated counterparts, they are vulnerable to length extension attacks.

@clickingbuttons clickingbuttons changed the title std.crypto.hash.sha2: generalize sha512 truncation std.crypto.hash.sha2: cleanup add add more docs Apr 23, 2024
@clickingbuttons
Copy link
Contributor Author

I moved the truncation comment to the module docs, dropped the overeager truncated namespace, and did away with fn Sha512Truncated (i.e. no more generics).

lib/std/crypto/sha2.zig Outdated Show resolved Hide resolved
@jedisct1 jedisct1 merged commit 8a36a1f into ziglang:master Apr 28, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants