-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: non-bufferring multipart body encoder #3151
Conversation
Reminder to myself to include tests discussed in #3138 here |
@hugomrdias there one issue that I'm not sure how to resolve. It appears that |
All tests except the example one (that also fails on master) are passing now. I think this is ready for the review. |
This comment has been minimized.
This comment has been minimized.
The test was failing due to a temporary infrastructure problem. All good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If merged, this is great performance win for browser devs, users, but also for IPFS Desktop users (ipfs/ipfs-webui#1529).
Side concerns:
- I am worried about added complexity and potential regression in the future.
Are we able to add tests/benchmarks that safeguard browser-related improvements? - As noted during our review call, problematic metadata is only supported by js-ipfs, we may want to look into tweaking HTTP API separately from this PR, to fix it before go-ipfs implements it.
That worries me as well. I am also not happy with increased complexity. Only other way I can imagine going about this (that would not involve API changes) is to have a
I think there is opportunity to simplify this approach a bit by using our custom types instead of
I was trying to come up with some approach here, e.g.
However I do not think we can not have a way to tell if browser did any buffering or not. Only thing I could come up with is to generate fragment of data from echo server stop writing until corresponding put occurs on other endpoint. However that is really complex and we need to go through some hoops. There is also no guarantee that browser doesn't read say 2 two chunks at a time. I think better strategy is to test that when we put in blobs (and alike) what we get on the other end is blobs (not objects with async iterate content). That is a lot easier to test and is free from breaking when browser changes (e.g. how much it fetches before it starts upload). |
Added more tests to ensure that result of
There is the caveat, this will not catch all regression e.g. if for some reason |
Test are failing now due to #3169 |
I had a conversation with @achingbrain earlier today and we have decided:
I think it might also make sense to factor out introduced |
Externalized File and Blob implementations. |
I've merged /pull/3184 in favour of this. I hope that it's taken on some of the good ideas from this PR. It bums me out a little, because you've clearly spent a lot of time and effort on this, but ultimately I think requiring people to use non-standard Blob/FormData/etc implementations to use our HTTP API is a step too far, and taking on the long-term maintenance burden of those custom implementations is not something we should be doing given the available dev capacity. |
Status
Overview
Normalization
Before
normaliseInput
used to normalize arbitrary input taken byipfs.add
intoAsyncIterable<FileObject>
whereFileObject
is:There was (implicit) invariant that if
FileObject
doesn't havecontent
it represents a directory.However representing
content
asAsyncIterable<ArrayBufferView|ArrayBuffer>
is what lead to buffering in the browser asfetch
still does support stream body.After
This patch changes
normaliseInput
to produce a different output:AsyncIterable<ExtendedFile|FileStream|Directory>
whereDirectory
is just likeFileObject
and does not havecontent
.ExtendedFile
represents aFileObject
with known sizeFile
File
is used in nodeBlob
is used in nodemtime
,mode
andpath
properties (assumed by ipfs-unixfs-importer).content
getter which returnsAsyncIterable<Uint8Array>
of it's parts, which creates compatibility withFileObject
interface.FileStream
is just likeFileObject
that does have acontent
.AsyncIterable<*>
ExtendedFile
becausemultipartRequest
can't add it to theFormData
without buffering it's body, while it can do that withExtendedFile
.Multipart Encoder
New
FormDataEncoder
class was added that provides can encodeAsyncIterable<Part>
intoAsyncIterable<BlobPart>
representing body of the multipart request, wherePart
is:to-stream
module had being replaced byto-body
which turnsAsyncIterable<BlobPart>
to readable stream on node and intoBlob
in browser.With above pieces in place
multipartRequest
nowAsyncIterable<ExtendedFile|FileStream|Directory>
AsyncIterable<Part>
(and ensures thatExtendedFile
is passed as content instead of passing it's content, to avoid buffering)AsyncIterable<BlobPart>
viaFormDataEncoder
.toBody
(that in node produces readable stream and in browser produces blob).Result
ipfs.add
can continue usingnormalizeInput
as changes to it should be API (backwards) compatible.ipfs-http-client on node should continue using streams. Only thing that changed there is that some inputs are turned into
Blob
s instead ofAsyncIterator
s but during form data encoding all gets flattened anyway.ipfs-http-client in browser will not buffer as long as input passed in isn't a stream and will fall back to buffering otherwise. E.g.
ipfs.add([ 'hello', await (await fetch(url)).blob(), { path: '/foo/bar', content: droppedFile } ])
will not incur bufferingipfs.add([ 'hello', { path: '/foo', content: droppedFile.stream() }, await (await fetch(url)).blob() ])
will only buffer content's of thedroppedFile
and use other pieces as is.I am not super happy with complexity of all this, nor with the fact that user can accidentally fall of happy path and incur buffering but I do not believe there is a better option without changing an API.
attempt to fix #3029