Skip to content
This repository has been archived by the owner on Jul 13, 2023. It is now read-only.

Paperclip copies too many files to the file system #1642

Closed
amilligan opened this issue Sep 7, 2014 · 6 comments
Closed

Paperclip copies too many files to the file system #1642

amilligan opened this issue Sep 7, 2014 · 6 comments

Comments

@amilligan
Copy link

I don't want to depend on the file system. Providers like Heroku like to pull the file system out from under running servers at inconvenient times, and those errors are just a load of fun to track down. So, I try to use IO streams whenever possible. In particular, if I save some content I've generated (say, in a background task), I feed that content in to Paperclip using a StringIO object, or something similar.

Up until recently this worked great. Now, with spoofing detection and content type validation (which seems like good ideas, btw, just maybe not so great in practice yet) Paperclip feels the need to copy my stream to the file system. Multiple times.

First, the StringioAdapter creates a temporary file so it can run the content type check using a file command. Two problems with this: first, if it's content I created I want to tell Paperclip to trust me and skip this step; second, the file command seems to be wrong about 60% of the time, creating validation errors for perfectly valid files. For instance, it returns 'text/plain' for a CSV file, and 'application/zip' for an XLSX file; not great for anyone generating spreadsheet reports.

Later, in order to do the spoof detection validation, Paperclip wraps my attachment in another adapter (AttachmentAdapter), which copies the content from the first temporary file to another temporary file. Let's hope that first temporary file is still there. And, again, if I generated the content myself then I'm fairly certain there's no spoofing going on. Again, Paperclip has no option to turn off this validation.

Finally, when saving to the file system, Paperclip assumes that temporary files already exist and it can simply do a file copy from one place to another; sadly, those temporary files don't exist because I patched that nonsense out. Now, no sensibly minded person stored content on the file system on a deployed application, and the S3 storage adapter treats the content (appropriate) as a stream. So, this is just an annoyance for development. But, an annoyance all the same.

I've solved these problems in a little library here: https://github.com/buildgroundwork/paperclip-trusted-io

I'm happy to turn these changes into pull requests, if they have any chance of being a) considered in some soft of timely fashion, and b) accepted.

@maclover7
Copy link
Contributor

Hi @amilligan ! Is this still an issue for you in Paperclip; I know this issue is from approximately 1 year ago. If it is still an issue, would you be willing to send in a PR to solve the problem? Thanks!

@tute
Copy link
Contributor

tute commented May 16, 2015

Thanks for your report. Closing in favor of duplicate #1326.

@tute tute closed this as completed May 16, 2015
@cthorner
Copy link

cthorner commented Oct 9, 2015

#1326 does not solve this issue, does it? As far as I can understand, it still relies on a filesystem and we are still not able to circumvent the spoof detection for trusted content. The issue raised here is related to using paperclip in memory only scenarios with StringIO.

@amilligan
Copy link
Author

No, that issue isn't actually related. I chose not to fight that battle.

@tute tute reopened this Aug 19, 2016
tute pushed a commit that referenced this issue Aug 19, 2016
#2120)

Paperclip duplicates the original files as part of its validation process. (#1642, #1326).

When uploading large files (several hundred megabytes to gigabyte range), this becomes a problem: The web server will be busy creating 3 - 4 duplicates on disk, while the app (and potentially the user) are waiting for the upload operation to complete.

This commit introduces hard links instead of `FileUtil.cp` where possible to keep the logic as-is but save time and disk space.
@tute tute changed the title Paperclip copies every damn thing to the file system Paperclip copies too many files to the file system Aug 19, 2016
tute pushed a commit that referenced this issue Aug 28, 2016
#2290)

Rebased #2120 to master.
Paperclip duplicates the original files quite a lot as part of its validation process. (#1642, #1326).
When uploading large files (several hundred megabytes to gigabyte range), this becomes a problem: The web server will be busy creating 3 - 4 duplicates on disk, while the app (and potentially the user) are waiting for the upload operation to complete.
This pull request introduces hard links instead of ```FileUtil.cp``` where possible to keep the logic as-is but save time and disk space.
@tute
Copy link
Contributor

tute commented Aug 28, 2016

Merged in #2290 that uses links instead of files when possible.

@mike-burns
Copy link
Contributor

Solved via #2290.

HoneyryderChuck pushed a commit to onfido/paperclip that referenced this issue Jul 15, 2021
thoughtbot#2290)

Rebased thoughtbot#2120 to master.
Paperclip duplicates the original files quite a lot as part of its validation process. (thoughtbot#1642, thoughtbot#1326).
When uploading large files (several hundred megabytes to gigabyte range), this becomes a problem: The web server will be busy creating 3 - 4 duplicates on disk, while the app (and potentially the user) are waiting for the upload operation to complete.
This pull request introduces hard links instead of ```FileUtil.cp``` where possible to keep the logic as-is but save time and disk space.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants