This repository has been archived by the owner on Jul 13, 2023. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Paperclip copies too many files to the file system #1642
Comments
Hi @amilligan ! Is this still an issue for you in Paperclip; I know this issue is from approximately 1 year ago. If it is still an issue, would you be willing to send in a PR to solve the problem? Thanks! |
Thanks for your report. Closing in favor of duplicate #1326. |
#1326 does not solve this issue, does it? As far as I can understand, it still relies on a filesystem and we are still not able to circumvent the spoof detection for trusted content. The issue raised here is related to using paperclip in memory only scenarios with StringIO. |
No, that issue isn't actually related. I chose not to fight that battle. |
tute
pushed a commit
that referenced
this issue
Aug 19, 2016
#2120) Paperclip duplicates the original files as part of its validation process. (#1642, #1326). When uploading large files (several hundred megabytes to gigabyte range), this becomes a problem: The web server will be busy creating 3 - 4 duplicates on disk, while the app (and potentially the user) are waiting for the upload operation to complete. This commit introduces hard links instead of `FileUtil.cp` where possible to keep the logic as-is but save time and disk space.
tute
changed the title
Paperclip copies every damn thing to the file system
Paperclip copies too many files to the file system
Aug 19, 2016
tute
pushed a commit
that referenced
this issue
Aug 28, 2016
#2290) Rebased #2120 to master. Paperclip duplicates the original files quite a lot as part of its validation process. (#1642, #1326). When uploading large files (several hundred megabytes to gigabyte range), this becomes a problem: The web server will be busy creating 3 - 4 duplicates on disk, while the app (and potentially the user) are waiting for the upload operation to complete. This pull request introduces hard links instead of ```FileUtil.cp``` where possible to keep the logic as-is but save time and disk space.
Merged in #2290 that uses links instead of files when possible. |
Solved via #2290. |
HoneyryderChuck
pushed a commit
to onfido/paperclip
that referenced
this issue
Jul 15, 2021
thoughtbot#2290) Rebased thoughtbot#2120 to master. Paperclip duplicates the original files quite a lot as part of its validation process. (thoughtbot#1642, thoughtbot#1326). When uploading large files (several hundred megabytes to gigabyte range), this becomes a problem: The web server will be busy creating 3 - 4 duplicates on disk, while the app (and potentially the user) are waiting for the upload operation to complete. This pull request introduces hard links instead of ```FileUtil.cp``` where possible to keep the logic as-is but save time and disk space.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I don't want to depend on the file system. Providers like Heroku like to pull the file system out from under running servers at inconvenient times, and those errors are just a load of fun to track down. So, I try to use IO streams whenever possible. In particular, if I save some content I've generated (say, in a background task), I feed that content in to Paperclip using a StringIO object, or something similar.
Up until recently this worked great. Now, with spoofing detection and content type validation (which seems like good ideas, btw, just maybe not so great in practice yet) Paperclip feels the need to copy my stream to the file system. Multiple times.
First, the StringioAdapter creates a temporary file so it can run the content type check using a file command. Two problems with this: first, if it's content I created I want to tell Paperclip to trust me and skip this step; second, the file command seems to be wrong about 60% of the time, creating validation errors for perfectly valid files. For instance, it returns 'text/plain' for a CSV file, and 'application/zip' for an XLSX file; not great for anyone generating spreadsheet reports.
Later, in order to do the spoof detection validation, Paperclip wraps my attachment in another adapter (AttachmentAdapter), which copies the content from the first temporary file to another temporary file. Let's hope that first temporary file is still there. And, again, if I generated the content myself then I'm fairly certain there's no spoofing going on. Again, Paperclip has no option to turn off this validation.
Finally, when saving to the file system, Paperclip assumes that temporary files already exist and it can simply do a file copy from one place to another; sadly, those temporary files don't exist because I patched that nonsense out. Now, no sensibly minded person stored content on the file system on a deployed application, and the S3 storage adapter treats the content (appropriate) as a stream. So, this is just an annoyance for development. But, an annoyance all the same.
I've solved these problems in a little library here: https://github.com/buildgroundwork/paperclip-trusted-io
I'm happy to turn these changes into pull requests, if they have any chance of being a) considered in some soft of timely fashion, and b) accepted.
The text was updated successfully, but these errors were encountered: