Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: customizeable content type parsing in @helia/verified-fetch #422

Closed
SgtPooki opened this issue Feb 6, 2024 · 5 comments · Fixed by #423
Closed

feat: customizeable content type parsing in @helia/verified-fetch #422

SgtPooki opened this issue Feb 6, 2024 · 5 comments · Fixed by #423
Assignees

Comments

@SgtPooki
Copy link
Member

SgtPooki commented Feb 6, 2024

Discussed with @achingbrain due to #416

Goals

  • keep bundle size small
  • Provide some content-type recognition for VERY COMMON use-cases (see below)
  • allow overriding of content-type parsing for more complicated consumer scenarios.

Initial design idea

Remove dependency on mime-types, don't depend on file-type

some interface such as createVerifiedFetch(helia, { contentTypeParser: (bytes) => myFn}) and we provide a default contentTypeParser that determines content type for the below list only.

We would pass the contentTypeParser function the first block of bytes we receive; and because most of our blocks are 1MB or below, we can safely assume the majority of content types users need to recognize can be determined by looking at those first 1MB of bytes.

If content-type is not a recognized type from the below list, we do not set it (allows browser sniffing).

Supported content types

  • image/jpg
  • image/png
  • video
  • tar
  • [TODO: which types should we support by default]?

References

cc @achingbrain @aschmahmann @lidel @2color

@SgtPooki
Copy link
Member Author

SgtPooki commented Feb 6, 2024

FYI that file-type is fairly small compared to the entirety of @helia/verified-fetch (currently totals 560.2kb) at only 26.7kb:

image

@SgtPooki SgtPooki self-assigned this Feb 6, 2024
@lidel
Copy link
Member

lidel commented Feb 6, 2024

Providing a way to pass custom content type sniffer sounds sensible, but will be a very niche feature request if your default is something comprehenbsive like file-type with magic bytes sniffing.

I think the question we could ask is when is content-type relevant:

  • If we use verified-fetch in JS the same way as fetch, the content-type header won't matter.
    End user will use .json(), .text(), .blob() etc themselves.
  • If we use verified-fetch in service worker for web gateway implementation, then we pass response to browser renderer directly, and returned content-type matters. In this case hard-coding a few content types won't be enough anyway, and the user wants something more future-proof, like file-type.

That is to say, I think it is sensible to either:

  • go with file-type everywhere (avoid maintaining "minimal list of types we support", ~5% bundle size increase does not sound like a lot when compared to UX/DX of content-type being taken care of)
  • OR skip setting content-type by default entierely, and only use it gateway contexts, in which you use contentTypeParser to pass file-type that does the comprehensive magic bytes sniffing.

@achingbrain
Copy link
Member

My feeling is that if we don't need to do content type sniffing then let's not do it.

If we need to do it, we should do the minimum required (e.g. just support detecting a small subset of content types) and provide an extension mechanism for more comprehensive detection.

Given that we're billing this as fetch-like, most people will just do .json(), .blob(), etc and get on with things which suggests that we don't need to detect content types - we just try to process the data as the requested type and fail loudly if we can't.

There are valid use-cases for content detection though (e.g. service worker gateway) so allowing users to configure a mime type sniffer if they need it seems like a good compromise.

@2color
Copy link
Member

2color commented Feb 7, 2024

Mostly agree with @lidel and @achingbrain, though I don't have a strong inclination either way.

If we don't include magic-byte sniffing by default, it should be as easy as possible to configure so it works smoothly in service workers.

@SgtPooki
Copy link
Member Author

SgtPooki commented Feb 7, 2024

Sounds good. Ill get a PR out today that will not do content-type unless passed a function for it

@SgtPooki SgtPooki linked a pull request Feb 7, 2024 that will close this issue
3 tasks
@SgtPooki SgtPooki moved this to 🏃‍♀️ In Progress in IPFS Shipyard Team Feb 7, 2024
achingbrain added a commit that referenced this issue Feb 8, 2024
* adds `contentTypeParser` function to createVerifiedFetch options & implements it.
* renamed `getStreamAndContentType` to `getStreamFromAsyncIterable` that now returns a stream with the firstChunk seen, so we can pass it to the `contentTypeParser` function.
* updates tests in packages/verified-fetch & packages/interop
* updates packageDocumentation with example

Related #416
Fixes #422
---------

Co-authored-by: achingbrain <alex@achingbrain.net>
@github-project-automation github-project-automation bot moved this from 🏃‍♀️ In Progress to 🎉 Done in IPFS Shipyard Team Feb 8, 2024
achingbrain added a commit to ipfs/helia-verified-fetch that referenced this issue Feb 28, 2024
* adds `contentTypeParser` function to createVerifiedFetch options & implements it.
* renamed `getStreamAndContentType` to `getStreamFromAsyncIterable` that now returns a stream with the firstChunk seen, so we can pass it to the `contentTypeParser` function.
* updates tests in packages/verified-fetch & packages/interop
* updates packageDocumentation with example

Related ipfs/helia#416
Fixes ipfs/helia#422
---------

Co-authored-by: achingbrain <alex@achingbrain.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants