Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Companion's URL controller should avoid getting the URL metadata if it already knows the file size #3034

Closed
james-tisato-kortical opened this issue Jul 21, 2021 · 5 comments · Fixed by #3048
Assignees
Labels
Companion The auth server (for Instagram, GDrive, etc) and upload proxy (for S3) Feature

Comments

@james-tisato-kortical
Copy link

The /get/ endpoint of the URL controller within Companion (https://github.com/transloadit/uppy/blob/master/packages/%40uppy/companion/src/server/controllers/url.js#L53) initially makes a HEAD request for the URL (via getURLMeta) to find out the file size. It then triggers the upload using that file size as an input. This initial HEAD request appears to be unnecessary because the request sent to the /get endpoint already contains the size. The suggestion is that getURLMeta only be called if the size is not already present in the request to /get (in case there is some edge case where it's not available).

Why is this important?
We are trying to send a S3 presigned URL to Companion (via our own custom Uppy plugin). We have generated the presigned URL for the S3 object but it's only valid for GET requests. If you try to make a HEAD request with that URL, it will fail. From what I can tell in S3 documentation, there's no way around this - presigned S3 URLs for different HTTP verbs must use different presigned URLs. Therefore, this is a problem - we can only send a single URL to Companion and it will always try to perform both HEAD and GET requests with it.

Alternatively, since we are passing the file size in the request body for /get (just as the Uppy URL plugin does already), the HEAD request could be avoided and the URL could be used just for the GET operation.

@james-tisato-kortical
Copy link
Author

I've tested a change locally and it seems to fix the issue:

diff --git a/packages/@uppy/companion/src/server/controllers/url.js b/packages/@uppy/companion/src/server/controllers/url.js
index 186610dd2..87da9e075 100644
--- a/packages/@uppy/companion/src/server/controllers/url.js
+++ b/packages/@uppy/companion/src/server/controllers/url.js
@@ -50,7 +50,7 @@ const get = (req, res) => {
     return res.status(400).json({ error: 'Invalid request body' })
   }
 
-  reqUtil.getURLMeta(req.body.url, !debug)
+  (!req.body.size ? reqUtil.getURLMeta(req.body.url, !debug) : Promise.resolve({size: req.body.size }))
     .then(({ size }) => {
       // @ts-ignore
       logger.debug('Instantiating uploader.', null, req.id)

@mifi
Copy link
Contributor

mifi commented Jul 26, 2021

I understand your problem and I created a PR #3048 with your change.
One thing I don't understand though, how can req.body.size ever be set without the client first running getURLMeta (which will also fail for signed S3 URLs)? If this workaround is only for a custom plugin, I'm not sure if this is the right way to do it, because it will not solve the root cause, and signed S3 urls using the Link plugin does still not work with this workaround. Maybe instead of using HEAD, we could do one of:

  1. use GET with a Range: 0-0 (https://stackoverflow.com/questions/15717230/pre-signing-amazon-s3-urls-for-both-head-and-get-verbs) - I'm not sure if all servers handle this though.
  2. a GET request that gets immediately terminated after response headers are received, without receiving any of the response body. I think all servers should be able to handle this.
  3. Rewrite the Uploader so that it doesn't need to know the size in advance, but can instead figure out the size

If we do one of these instead, we could make a more generic way of fetching all kinds of links, even signed ones.

Another issue with using req.body.size is that it is a client provided value, so it could cause security issues down the road. better to use a trusted server side variable

@james-tisato-kortical
Copy link
Author

Thanks for the update. I agree with your point that this fix doesn't work with the Link plugin. It only works for us because we are writing a custom plugin that can use separate HEAD and GET URLs if required. I was just looking for the simplest and most isolated workaround so we could carry on development.

Your other suggestions to improve this further make sense to me.

@mifi
Copy link
Contributor

mifi commented Jul 26, 2021

@arturi any preferences?

@Murderlon Murderlon added Companion The auth server (for Instagram, GDrive, etc) and upload proxy (for S3) and removed Triage labels Aug 10, 2021
@mifi
Copy link
Contributor

mifi commented Aug 30, 2021

I tried to look into (3) Rewrite the Uploader so that it doesn't need to know the size in advance, but it turns out that the uppy client already asks for metadata before downloading, which also is the same HEAD request, so changing the uploader will not solve anything:


So the only solution that I can see that will work is (1) and (2). 2 should be the most compliant with servers so I'll do that.

mifi added a commit that referenced this issue Aug 30, 2021
and abort request immediately upon response headers received
#3034 (comment)
mifi added a commit that referenced this issue Sep 30, 2021
… file names (#3048)

* rewrite to async/await

* Only fetch size (HEAD) if needed #3034

* Update packages/@uppy/companion/src/server/controllers/url.js

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

* Change HEAD to GET in getURLMeta

and abort request immediately upon response headers received
#3034 (comment)

* fix lint

* fix lint

* cut off length of file names

or else we get
"MetadataTooLarge: Your metadata headers exceed the maximum allowed metadata size" in tus / S3

* try to fix flaky test

* remove iife and cleanup code a bit

* fix lint by reordering code

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
mifi added a commit that referenced this issue Nov 1, 2021
…ad/download without saving to disk (#3159)

* rewrite to async/await

* Only fetch size (HEAD) if needed #3034

* Update packages/@uppy/companion/src/server/controllers/url.js

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

* Change HEAD to GET in getURLMeta

and abort request immediately upon response headers received
#3034 (comment)

* fix lint

* fix lint

* cut off length of file names

or else we get
"MetadataTooLarge: Your metadata headers exceed the maximum allowed metadata size" in tus / S3

* try to fix flaky test

* remove iife and cleanup code a bit

* fix lint by reordering code

* rename Uploader to MultipartUploader

* Rewrite Uploader to use fs-capacitor #3098

This allows for upload to start almost immediately without having to first download the file.
And it allows for uploading bigger files, because transloadit assembly will not timeout,
as it will get upload progress events all the time.
No longer need for illusive progress.
Also fix eslint warnings and simplify logic

Still TODO: TUS pause/resume has a bug:
tus/tus-js-client#275

* add comment in dev Dashboard and pull out variable

* fix a bug where remote xhr upload would ignore progress events in the UI

* fix bug where s3 multipart cancel wasn't working

* fix also cancel for xhr

* Rewrite providers to use streams

This removes the need for disk space as data will be buffered in memory and backpressure will be respected
#3098 (comment)
All providers "download" methods will now return a { stream } which can be consumed by uploader.

Also:
- Remove capacitor (no longer needed)
- Change Provider/SearchProvider API to async (Breaking change for custom companion providers)
- Fix the case with unknown length streams (zoom / google drive). Need to be downloaded first
- rewrite controllers deauth-callback, thumbnail, list, logout to async
- getURLMeta: make sure size is never NaN (NaN gets converted to null in JSON.stringify when sent to client but not when used in backend)
- fix purest mock (it wasn't returning statusCode on('response'))
- add missing http mock for "request" for THUMBNAIL_URL and http://url.myendpoint.com/file (these request errors were never caught by tests previously)
- "upload functions with tus protocol" test: move filename checking to new test where size is null. Fix broken expects
- fix some lint

* Implement streamingUpload flag

COMPANION_STREAMING_UPLOAD
Default to false due to backward compatibility
If set to true, will start to upload files at the same time as dowlnoading them, by piping the streams

- Also implement progress for downloading too
- and fix progress duplication logic
- fix test that assumed file was fully downloaded after first progress event

* rearrange validation logic

* add COMPANION_STREAMING_UPLOAD to env.test.sh too

* implement maxFileSize option in companion

for both unknown length and known length downloads

* fix bug

* fix memory leak when non 200 status

streams were being kept

* fix lint

* Add backward-compatibility for companion providers

Implement a new static field "version" on providers, which when not set to 2,
will cause a compatibility layer to be added for supporting old callback style provider api

also fix some eslint and rename some vars

* document new provider API

* remove static as it doesn't work on node 10

* try to fix build issue

* degrade to node 14 in github actions

due to hitting this error: nodejs/node#40030
https://github.com/transloadit/uppy/pull/3159/checks?check_run_id=3544858518

* pull out duplicated logic into reusable function

* fix lint

* make methods private

* re-add unsplash download_location request

got lost in merge

* add try/catch

as suggested #3159 (comment)

* Only set default chunkSize if needed

for being more compliant with previous behavior when streamingUpload = false

* Improve flaky test

Trying to fix this error:

FAIL packages/@uppy/utils/src/delay.test.js
  ● delay › should reject when signal is aborted

    expect(received).toBeLessThan(expected)

    Expected: < 70
    Received:   107

      32 |     const time = Date.now() - start
      33 |     expect(time).toBeGreaterThanOrEqual(30)
    > 34 |     expect(time).toBeLessThan(70)
         |                  ^
      35 |   })
      36 | })
      37 |

      at Object.<anonymous> (packages/@uppy/utils/src/delay.test.js:34:18)

https://github.com/transloadit/uppy/runs/3984613454?check_suite_focus=true

* Apply suggestions from code review

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

* fix review feedback & lint

* Apply suggestions from code review

Co-authored-by: Merlijn Vos <merlijn@soverin.net>

* remove unneeded ts-ignore

* Update packages/@uppy/companion/src/server/controllers/url.js

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

* Update packages/@uppy/companion/src/server/Uploader.js

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

* reduce nesting

* fix lint

* optimize promisify

#3159 (comment)

* Update packages/@uppy/companion/test/__tests__/uploader.js

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
Co-authored-by: Merlijn Vos <merlijn@soverin.net>
HeavenFox pushed a commit to docsend/uppy that referenced this issue Jun 27, 2023
… file names (transloadit#3048)

* rewrite to async/await

* Only fetch size (HEAD) if needed transloadit#3034

* Update packages/@uppy/companion/src/server/controllers/url.js

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

* Change HEAD to GET in getURLMeta

and abort request immediately upon response headers received
transloadit#3034 (comment)

* fix lint

* fix lint

* cut off length of file names

or else we get
"MetadataTooLarge: Your metadata headers exceed the maximum allowed metadata size" in tus / S3

* try to fix flaky test

* remove iife and cleanup code a bit

* fix lint by reordering code

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
HeavenFox pushed a commit to docsend/uppy that referenced this issue Jun 27, 2023
…ad/download without saving to disk (transloadit#3159)

* rewrite to async/await

* Only fetch size (HEAD) if needed transloadit#3034

* Update packages/@uppy/companion/src/server/controllers/url.js

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

* Change HEAD to GET in getURLMeta

and abort request immediately upon response headers received
transloadit#3034 (comment)

* fix lint

* fix lint

* cut off length of file names

or else we get
"MetadataTooLarge: Your metadata headers exceed the maximum allowed metadata size" in tus / S3

* try to fix flaky test

* remove iife and cleanup code a bit

* fix lint by reordering code

* rename Uploader to MultipartUploader

* Rewrite Uploader to use fs-capacitor transloadit#3098

This allows for upload to start almost immediately without having to first download the file.
And it allows for uploading bigger files, because transloadit assembly will not timeout,
as it will get upload progress events all the time.
No longer need for illusive progress.
Also fix eslint warnings and simplify logic

Still TODO: TUS pause/resume has a bug:
tus/tus-js-client#275

* add comment in dev Dashboard and pull out variable

* fix a bug where remote xhr upload would ignore progress events in the UI

* fix bug where s3 multipart cancel wasn't working

* fix also cancel for xhr

* Rewrite providers to use streams

This removes the need for disk space as data will be buffered in memory and backpressure will be respected
transloadit#3098 (comment)
All providers "download" methods will now return a { stream } which can be consumed by uploader.

Also:
- Remove capacitor (no longer needed)
- Change Provider/SearchProvider API to async (Breaking change for custom companion providers)
- Fix the case with unknown length streams (zoom / google drive). Need to be downloaded first
- rewrite controllers deauth-callback, thumbnail, list, logout to async
- getURLMeta: make sure size is never NaN (NaN gets converted to null in JSON.stringify when sent to client but not when used in backend)
- fix purest mock (it wasn't returning statusCode on('response'))
- add missing http mock for "request" for THUMBNAIL_URL and http://url.myendpoint.com/file (these request errors were never caught by tests previously)
- "upload functions with tus protocol" test: move filename checking to new test where size is null. Fix broken expects
- fix some lint

* Implement streamingUpload flag

COMPANION_STREAMING_UPLOAD
Default to false due to backward compatibility
If set to true, will start to upload files at the same time as dowlnoading them, by piping the streams

- Also implement progress for downloading too
- and fix progress duplication logic
- fix test that assumed file was fully downloaded after first progress event

* rearrange validation logic

* add COMPANION_STREAMING_UPLOAD to env.test.sh too

* implement maxFileSize option in companion

for both unknown length and known length downloads

* fix bug

* fix memory leak when non 200 status

streams were being kept

* fix lint

* Add backward-compatibility for companion providers

Implement a new static field "version" on providers, which when not set to 2,
will cause a compatibility layer to be added for supporting old callback style provider api

also fix some eslint and rename some vars

* document new provider API

* remove static as it doesn't work on node 10

* try to fix build issue

* degrade to node 14 in github actions

due to hitting this error: nodejs/node#40030
https://github.com/transloadit/uppy/pull/3159/checks?check_run_id=3544858518

* pull out duplicated logic into reusable function

* fix lint

* make methods private

* re-add unsplash download_location request

got lost in merge

* add try/catch

as suggested transloadit#3159 (comment)

* Only set default chunkSize if needed

for being more compliant with previous behavior when streamingUpload = false

* Improve flaky test

Trying to fix this error:

FAIL packages/@uppy/utils/src/delay.test.js
  ● delay › should reject when signal is aborted

    expect(received).toBeLessThan(expected)

    Expected: < 70
    Received:   107

      32 |     const time = Date.now() - start
      33 |     expect(time).toBeGreaterThanOrEqual(30)
    > 34 |     expect(time).toBeLessThan(70)
         |                  ^
      35 |   })
      36 | })
      37 |

      at Object.<anonymous> (packages/@uppy/utils/src/delay.test.js:34:18)

https://github.com/transloadit/uppy/runs/3984613454?check_suite_focus=true

* Apply suggestions from code review

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

* fix review feedback & lint

* Apply suggestions from code review

Co-authored-by: Merlijn Vos <merlijn@soverin.net>

* remove unneeded ts-ignore

* Update packages/@uppy/companion/src/server/controllers/url.js

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

* Update packages/@uppy/companion/src/server/Uploader.js

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

* reduce nesting

* fix lint

* optimize promisify

transloadit#3159 (comment)

* Update packages/@uppy/companion/test/__tests__/uploader.js

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
Co-authored-by: Merlijn Vos <merlijn@soverin.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Companion The auth server (for Instagram, GDrive, etc) and upload proxy (for S3) Feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants