Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept (async) iterables in crypto.subtle.digest #390

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

twiss
Copy link
Member

@twiss twiss commented Dec 20, 2024

Addresses part of #73 by accepting an iterable or async iterable (such as a ReadableStream) of BufferSources in crypto.subtle.digest, in order to enable incremental/streaming hashing.

(In the future, we could do something similar for crypto.subtle.sign and crypto.subtle.verify fairly easily, but I started with crypto.subtle.digest since it seems most important and I wanted to check whether folks are happy with this direction first.)

Note that the spec text doesn't require incremental/streaming hashing because FIPS-180-4 defines the SHA functions as single-shot functions which take a complete message, even though it is very common for implementations to provide a streaming API.

Note also that the Web IDL isn't super pretty because it doesn't support syntax like

Promise<ArrayBuffer> digest(
  AlgorithmIdentifier algorithm,
  (BufferSource or iterable<BufferSource> or async iterable<BufferSource>) data
);

so instead we do

typedef object IterableOfBufferSources;
typedef object AsyncIterableOfBufferSources;

Promise<ArrayBuffer> digest(
  AlgorithmIdentifier algorithm,
  (BufferSource or IterableOfBufferSources or AsyncIterableOfBufferSources) data
);

and check the types involved in the calling function.

(Since this is a somewhat specific use case, where it's important that we retain a reference to the iterable after the function returns a Promise, rather than letting Web IDL copy the data as e.g. sequence<BufferSource> would, I personally think that's fine and we don't specifically need syntactic sugar for this in Web IDL, which might be difficult to achieve anyway.)

Finally, I've left the language around running in parallel for now, even though it's not completely clear whether we want/need it. However, when accepting async iterables specifically, we need to return a Promise and queue a task to resolve it anyway, so for this specific functionality it probably doesn't matter that much.

AFAICT, due to the loose definition of running in parallel, implementations are already free to do the hashing on the same thread if they wish (e.g. if the overhead of using multiple threads isn't worth it).

Finally, I didn't include ECMAScript in the list of xref'ed specs, because it caused a bunch of ambiguities for TypeError and SyntaxError, increasing the number of required manual citations more than I had to do here.

(Apologies for the long list of review requests, but since this is a fairly significant new feature I wanted to make sure everybody's on board with this.)


Preview | Diff

@jasnell
Copy link
Collaborator

jasnell commented Jan 4, 2025

Thinking about the case for Symbol.iterator a bit.. since a string implements Symbol.iterator this change would mean that we'd need to add an additional type check in the implementation to catch string arguments before performing the Symbol.iterator check on the input. Instead, perhaps we can also update digest to accept a string input that is always interpreted as utf8. So, for instance. await crypto.subtle.digest('sha256', 'hello') would currently throw an error but could be supported along with the change here.

@twiss
Copy link
Member Author

twiss commented Jan 4, 2025

Since String's Symbol.iterator produces strings, they would be rejected by the current text which says

If value is not a BufferSource, reject promise ...

without any specific check for strings. Except that an empty string would produce the same result as an empty array or empty BufferSource, I suppose, which is perhaps a bit odd, but not strictly wrong.

Either way, we could add support for strings later in a fully backwards compatible way even with this change, so I'd propose that we keep it separate.

(Also, I think there's something to be said for being explicit, as in await crypto.subtle.digest('SHA-256', new TextEncoder().encode('hello')) or await crypto.subtle.digest('SHA-256', stream.pipeThrough(new TextEncoderStream())), so I kind of think we should discuss this separately before changing it as well.)

@annevk
Copy link
Member

annevk commented Jan 6, 2025

Was implementer interest ever established?

I think it's better if the Web IDL just used object directly if that's unavoidable. Using it redundantly and indirectly does not actually make things easier to read (and maybe we should make it an error to use something redundantly). You also need to use the relevant ECMAScript operations to handle the passed in object just like Web IDL would do. "conforms to" doesn't mean anything. It's probably worth filing an issue against Web IDL to get advice and possibly get this added to Web IDL directly as doing these kind of conversations yourself is not recommended.

I would also ask @MattiasBuelens or others participating in https://github.com/whatwg/streams for review as there's quite a few subtleties.

@twiss
Copy link
Member Author

twiss commented Jan 6, 2025

Was implementer interest ever established?

There is implementer interest from the server-side runtimes, and this specific API was proposed by @jakearchibald when he was at Google, does that count? ;)

More seriously, it was discussed in the WebAppSec WG a couple times but no very concrete signals yet. I figured it might be easier to discuss when there's a concrete proposal on the table. But obviously if you're opposed to or not interested in streaming altogether that's also fair.

I think it's better if the Web IDL just used object directly if that's unavoidable. Using it redundantly and indirectly does not actually make things easier to read (and maybe we should make it an error to use something redundantly).

The reason I did it this way is because if you have tooling that tells you the types/IDL involved, it'll give you a hint about what kind of objects you can pass, as opposed to just object which is fairly meaningless. Maybe, to make it not redundant but still somewhat descriptive, we could do

typedef object MaybeAsyncIterableOfBufferSources;

Promise<ArrayBuffer> digest(
  AlgorithmIdentifier algorithm,
  (BufferSource or MaybeAsyncIterableOfBufferSources) data
);

or even

typedef object BufferSources; // (BufferSource or iterable<BufferSource> or async iterable<BufferSource>)

Promise<ArrayBuffer> digest(
  AlgorithmIdentifier algorithm,
  BufferSources data
);

? (Unless it ends up being added to Web IDL, of course.)

You also need to use the relevant ECMAScript operations to handle the passed in object just like Web IDL would do.

Fair enough.

"conforms to" doesn't mean anything.

The ECMAScript spec uses similar language and offers a definition for it here. I'll link to that.

It's probably worth filing an issue against Web IDL to get advice and possibly get this added to Web IDL directly as doing these kind of conversations yourself is not recommended.

Fair enough. I wasn't sure if it's a pattern we need anywhere else but ofc I'm happy to discuss it.

@annevk
Copy link
Member

annevk commented Jan 6, 2025

ECMAScript uses it as a statement of fact, not as a way to determine what type of object a thing is.

@twiss
Copy link
Member Author

twiss commented Jan 6, 2025

ECMAScript uses it as a statement of fact, not as a way to determine what type of object a thing is.

OK, fair enough. Anyway I don't think this check is strictly necessary since GetIterator also returns a TypeError if the passed object is not an iterable object, so I'll just remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants