Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(crypto): add std/crypto wrapping and extending runtime WebCrypto #1025

Merged
merged 26 commits into from
Jul 29, 2021
Merged

feat(crypto): add std/crypto wrapping and extending runtime WebCrypto #1025

merged 26 commits into from
Jul 29, 2021

Conversation

jeremyBanks
Copy link
Contributor

@jeremyBanks jeremyBanks commented Jul 12, 2021

Background

The current std/hash implementation (inspired by Node's crypto.Hash interface) is largely redundant with the standard WebCrypto crypto.subtle.digest function that the runtime will soon support. Deno tries to aim for interoperability, so we probably want to eliminate our redundant interface eventually.

However, WebCrypto only supports a very minimal set of hash digest algorithms, and our users may need others. We don't want to add these to our runtime's WebCrypto implementation, as this is discouraged by the standard and could harm interoperability.

See discussion at #1025 (comment).

Description

Attempting to balance these concerns, this PR adds std/crypto, which exports a single object extending the standard Crypto/SubtleCrypto interface. For most operations, it will simply re-export the methods from the runtime's WebCrypto. However, for digest operations using an algorithm that WebCrypto doesn't support, it will instead use our bundled Rust/WASM implementation. This allows users to adopt it with minimal code changes and deviation from the standard interface, while maintaining support for other runtimes (such as browsers).

// using runtime implementation directly:
const digestA = await crypto.subtle.digest("SHA-384", new ArrayBuffer(0));

// throws an error as BLAKE3 is not supported in the WebCrypto standard:
const digestB = await crypto.subtle.digest("BLAKE3", new ArrayBuffer(0));
import crypto from "https://deno.land/std/crypto/mod.ts";

// using the runtime implementation transparently through our wrapper:
const digestA = await crypto.subtle.digest("SHA-384", new ArrayBuffer(0));

// using our bundled WASM implementation of BLAKE3:
const digestB = await crypto.subtle.digest("BLAKE3", new ArrayBuffer(0));

This is intended to replace the existing std/hash eventually, but we leave it untouched for now (despite the redundancy) as that removal should wait for future release.

The std/node/crypto compatibility module has features (incremental .update()s, .copy()able hash state) that aren't part of the standard WebCrypto interface. So instead of depending on std/hash or std/crypto, it will now depend directly on the WASM implementation, which we move to a common location, std/_wasm_crypto.

Features

Capabilities that std/crypto adds on top of the runtime's crypto:

  • Support for additional hash digest algorithms:
    • BLAKE2B-256
    • BLAKE2B-384
    • BLAKE2B
    • BLAKE2S
    • BLAKE3
    • KECCAK224
    • KECCAK256
    • KECCAK384
    • KECCAK512
    • MD5
    • RIPEMD-160
    • SHA-224
    • SHA3-224
    • SHA3-256
    • SHA3-384
    • SHA3-512
    • SHAKE128
    • SHAKE256
  • Adds { length?: number } parameter for extendable-output algorithms ("XOFs": BLAKE3, SHAKE128, SHAKE256). For example, { name: "BLAKE3", length: 20 }. This is consistent with how WebCrypto parameterizes other algorithms.
  • Adds crypto.subtle.digestSync, a synchronous version of .digest with the same arguments.
  • Adds support for AsyncIterable and Iterable inputs for .digest() (and Iterable for .digestSync()), to allow incremental digesting of large inputs. (This is a superset of the proposed addition of ReadableStream support to the spec: Bug 27755 - Using the Subtle Crypto Interface with Streams w3c/webcrypto#73.)

Performance

Benchmarking script included. This is currently faster than the runtime's WebCrypto implementation (see related discussion in Discord), so I think it's fast enough for now. There is probably room to improve both going forward.

Related Issues


Original Description: BREAKING(hash): Expand hashing interface in line with other languages

The WASM-based Hasher implementation in hash/ currently exposes an initial minimum-viable interface. It requires a new Hasher instance to be created for each message, and once .digest() or .toString() is called for the first time, the instance is invalidated and any further method calls will throw. This interface is usable (modulo implementation issues, see #999, #1010, #1012), but it lacks some of the conveniences that exist in some other languages' libraries. (I've been looking at Python, Rust, Go, and WebCrypto.) Needing to create new instances for every input can also be a significant source of overhead in some cases (such as performing a lot of hashes in a hot loop, where the GC may not have a chance to clean them up very often), and the interface doesn't provide any way to avoid that if you need to.

I propose this update to the hashing interface to bring it more in line with what other languages offer.

Aside re: WebCrypto

I'm not sure whether or not this is redundant with the ongoing work on WebCrypto. I think it might still be useful, as this implementation provides several more algorithms than WebCrypto's conservative specification. This implementation is synchronous, while WebCrypto can only be called async. This implementation should be browser-compatible, and can be used on non-secure origins where WebCrypto is unavailable. However, this implementation will presumably be a bit slower than WebCrypto, particularly for algorithms that are able to exploit parallelism.

This is up to the team to decide, but I would just say that if we are going to keep std/hash, then it deserves an overhaul before 1.0.

Interface Changes

  • BREAKING CHANGE: .digest() returns a Uint8Array instead of an ArrayBuffer, as suggested at Review the std/hash Hasher interface #652 (comment).
  • .digest() and .toString() can be called multiple times without side-effects.
  • .reset() and .digestAndReset() methods are added to allow instance reuse.
  • A .clone() method is added to allow more efficient calculation of messages from a common prefix.
  • Support is added for the shake128, shake256, blake2b, blake2b-256, blake2b-384, and blake2s algorithms, all of which are also supported in current versions of Python's hashlib or Go's crypto package, and have implementations available from the RustCrypto project we already depend on.
  • .digest() now takes an optional options object with a length property.
    • length is intended for use with algorithms that support variable-length digest output, such as blake3 and shake128. Specifying a length for an algorithm with a fixed-length output will throw an error unless the specified size matches that fixed size.
  • .toString() now takes an options object with the property encoding to replace the previous positional formatting argument. This option object can also contain digest options (length).
    • The previous function signature is still also supported for compatibility.
  • A"unicode" encoding option is added to .toString(); this encodes byte values as unicode code points, in the same manner as the bota()/atob() functions.
  • A module-level digest(algorithm, message, options?) function was added, which internally reuses a hasher instance for calls with the same algorithm, providing users with an efficient option for the most common simple use cases.
  • BREAKING CHANGE: Renamed createHash to createHasher since the type is called Hasher, not Hash.

Implementation Changes

  • Errors thrown from Rust are now proper Error object with stack traces (they were previously just strings without stack traces).
  • The generated bindings and generated WASM binary are split into separate files, as the bindings are now getting larger and more significant.
  • Removes the only remaining internal reference to the JavaScript hash implementations, in case we want to remove them as suggested at Remove JavaScript versions of std/hash #658.

Documentation Changes

  • Added more detailed docstrings to the public interface.
  • README: Replaced use of MD5 in examples with SHA-384 to help guide inexperienced users towards a safer option.
  • README: Added links to canonical references for each supported hash algorithm.

Out of Scope

Although these could all make reasonable inclusions in the module, this PR does not include:

  • a way to read the output of an extendable hash digest (XOF) as an infinite stream.
  • keyed-hashing/HMAC support.
  • incremental validation capabilities supported by hashes like BLAKE3 or KangarooTwelve.

@jeremyBanks jeremyBanks changed the title BREAKING CHANGE(hash): Expand hashing interface in line with other languages BREAKING(hash): Expand hashing interface in line with other languages Jul 13, 2021
@jeremyBanks jeremyBanks marked this pull request as ready for review July 14, 2021 01:18
Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make the call on whether this should go in or not but I don't see any glaring technical issues.

Caveat emptor, I only did a real quick review. I'll do a second pass if there's consensus it's eligible for inclusion.

hash/_wasm/src/lib.rs Outdated Show resolved Hide resolved
@wperron
Copy link
Contributor

wperron commented Jul 15, 2021

I don't have any issues to the changes to digest and toString nor the
addition of reset -- Maybe resetAndDigest can be ommited? clone seems a
bit weird to me but I don't really have strong reason to not include it.

The addition of the option bags on digest and toString seem a bit superflous
on the surface -- Do they fix an actual issue, or does it make the API easier to
use in a significant way? What would be downside of leaving those out?

As for the renaming of createHash to createHasher I'm not sure; I agree that
it's annoying to have the function name and interface name differ slightly, but
does that really require a breaking change?

@jeremyBanks
Copy link
Contributor Author

jeremyBanks commented Jul 15, 2021

I'm ready to make the suggested changes, but just to lay out my thinking.

It's true that these are backwards-incompatible changes. But given the previous state of the API - badly leaking memory, erroneous on certain data types - I doubt it has much adoption, and if we're making any breaking changes, I'm inclined to go ahead and clean up while we're at it. (Mainly re: the createHasher rename).

Regarding the option bags, I agree they seemed like overkill for the current options. However, the Deno style guide suggests no more than two positional arguments, and the digest function requires at least three arguments. It was also a forward compatibility consideration, as if we decide to later add support for signed hashing modes or whatever, those options could join them, rather than having a mix of positional and keyword arguments.

Regarding clone: I agree that the use case is a bit obscure, but it's been supported in most of the other hashing libraries I've looked at, and I don't think it adds much bloat.

Regarding digestAndReset: this one is a stretch. I added it because for the underlying Rust implementations, this is faster than the separate calls, because this allows it to destructively finalize the existing state as its going to be reset, instead of saving a copy of the existing state as digest on its own would do. This is a micro-optimization, maybe unnecessary, but the Rust API suggested it to me.

Regarding formatting: I'll see if I can copy in the standard Deno rustfmt config, since my editor keeps auto formatting incorrectly.

@wperron
Copy link
Contributor

wperron commented Jul 15, 2021

We're still in version 0.X of the standard lib so we do have some leeway to make breaking changes here. Unfortunately we don't yet have data on downloads for the standard lib, so exactly how much usage it gets in the wild is a bit of a shot in the dark at the moment.

Pending any more comments from @bnoordhuis I'd be ok with landing as is.

@kitsonk
Copy link
Contributor

kitsonk commented Jul 15, 2021

I am of the opinion that we should take a breaking change in std/hash, but:

  • It should line exactly to the API of WebCrypto.
  • It should only include hashes that WebCrypto doesn't support. We want people to use the web standards and we don't want to duplicate APIs or functionalities of the web standards.
  • We should get rid of the JavaScript only interfaces. The main reason they were remaining was that the web assembly versions didn't provide HMAC key generation, which is pretty critical for a lot of workloads.

Anything over and above that isn't std library, it is a community package, IMO.

@jeremyBanks
Copy link
Contributor Author

jeremyBanks commented Jul 15, 2021

I defer to the team on the subject of scope (I do not love WebCrypto, but I do love Deno's interoperability goals), but for what it's worth: it looks like Node's built-in crypto module provides synchronous hashing interfaces, which can't be implemented on top of WebCrypto's async APIs, so the Deno standard library may need to include its own implementation for the Node compatibility layer if nothing else. (Although I suppose that doesn't necessarily mean the implementation needs to be exported directly.)

@jeremyBanks
Copy link
Contributor Author

jeremyBanks commented Jul 16, 2021

I'm working on updating this to align with the WebCrypto interface as suggested by @kitsonk. Currently I'm imagining the following:

import stdCrypto from "https://deno.land/std@$STD_VERSION/crypto/mod.ts";

// Use the built-in `crypto` interface for WebCrypto-supported algorithms:
console.log(await crypto.subtle.digest("SHA-384", "hello world"));

// Use std/crypto for other supported algorithms:
console.log(await stdCrypto.subtle.digest("BLAKE3", "hello world"));

If they are decided to be in-scope, sync implementations of supported algorithms could go under stdCrypto.subtle.digestSync.

console.log(stdCrypto.subtle.digestSync("BLAKE3", "hello world"));

@littledivy
Copy link
Member

Having two different implementations of the same interface doesn't feel right to me. The WebCrypto spec allows introducing additional algorithms so why can't we add to it?

From https://w3c.github.io/webcrypto/#algorithms-section-overview -

...Further, this specification defines a process to allow additional specifications to introduce additional cryptographic algorithms.

Node.js did a similar thing with it's NODE-ED25519

@jeremyBanks
Copy link
Contributor Author

jeremyBanks commented Jul 16, 2021

@littledivy The spec allows for that, but it's "strongly discouraged":

3.1 Extensibility

Vendor-specific proprietary extensions to this specification are strongly discouraged. Authors must not use such extensions, as doing so reduces interoperability and fragments the user base, allowing only users of specific user agents to access the content in question.

If vendor-specific extensions are needed, the members should be prefixed by vendor-specific strings to prevent clashes with future versions of this specification. Extensions must be defined so that the use of extensions neither contradicts nor causes the non-conformance of functionality defined in the specification.

Adding these algorithms to WebCrypto seems like the less interoperable option, because code that depends on it will fail on non-Deno environments, while the WASM implementations allow us to author code that's browser-compatible.

@kitsonk
Copy link
Contributor

kitsonk commented Jul 16, 2021

I agree that it makes it difficult to sniff and therefore difficult to make isomorphic by adding them to WebCrypto. Having it part of std opens up the possibility of people using it on other runtimes.

Just thinking about usability, I would personally be fine if std offloaded to Web Crypto for hashes that are part of the Web standard. I just want to avoid duplicate implementations. It is bad for users and it is bad trying to chase down bugs.

@jeremyBanks jeremyBanks marked this pull request as draft July 16, 2021 22:43
@jeremyBanks jeremyBanks changed the title BREAKING(hash): Expand hashing interface in line with other languages BREAKING(hash→crypto): WIP Jul 18, 2021
@jeremyBanks jeremyBanks changed the title BREAKING(hash→crypto): WIP BREAKING: remove std/hash, add std/crypto, add std/_wasm_crypto, update std/node/crypto Jul 20, 2021
@jeremyBanks
Copy link
Contributor Author

jeremyBanks commented Jul 20, 2021

I've updated the implementation and description based on the discussion above. It still needs a bit more work and updated tests, but it's mostly there.

@jeremyBanks jeremyBanks marked this pull request as ready for review July 21, 2021 00:04
@kitsonk
Copy link
Contributor

kitsonk commented Jul 21, 2021

@jeremyBanks thanks a lot for the work... one thing I just discussed with @kt3k is that to make this land-able, we probably need to add the new /crypto modules and land the removal of existing modules separately, at a later date. I think we need to give people a Deno CLI release cycle to adjust. I know of a few projects that depend on these modules and will have either update to use only the web crypto APIs in Deno or use these modules.

My opinion is we can add the new modules before Deno 1.13, but we shouldn't remove the existing ones until the std release that comes with Deno 1.14.

We might want to consider adding some sort of "deprecated" message to the existing APIs.

@jeremyBanks jeremyBanks requested a review from kitsonk July 23, 2021 03:37
_wasm_crypto/_build.ts Outdated Show resolved Hide resolved
@jeremyBanks jeremyBanks requested a review from kt3k July 24, 2021 00:20
Copy link
Member

@littledivy littledivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits 😄

_wasm_crypto/Cargo.toml Show resolved Hide resolved
crypto/mod.ts Show resolved Hide resolved
@jeremyBanks jeremyBanks marked this pull request as draft July 27, 2021 00:29
@jeremyBanks jeremyBanks marked this pull request as ready for review July 27, 2021 01:21
_wasm_crypto/README.md Outdated Show resolved Hide resolved
node/crypto.ts Show resolved Hide resolved
Copy link
Member

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ry ry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thank you @jeremyBanks, I think many people will find this useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

createHash memory leak Review the std/hash Hasher interface
7 participants