-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thumbnail/Preview image generation is too slow #1732
Comments
No. We won't use imagemagick as that one isn't really secure to use with untrusted user input.
Why a setting and not making this default? cc @rullzer Thoughts? |
We already do this. The first time it is requested we generate a 2048x2048 (default) image. And from this one all smaller thumbnails are derived. We use 2048x2048 since else thing look blurry on your laptop screen. I'm still working on making our preview system more efficient. And something to create previews in cron is still on the list. |
@LukasReschke ImageMagick and security: Shouldnt the choice of image-backend at least be a choice for the user? I run Nextcloud on my server and feed it my images and documents - how is any of that untrusted input? Not everyone will run nextcloud as an enterprise renting it out to otters I think. btw I am curious about security of imagemagick, have any pointers? Is there any other faster alternative which is also secure? @rullzer I see. But it is still too slow, even with 2048x2048 gd imagecopyresample spends 0.8s for every image. Can gd be speeded up? Can it be told to use more threads or processes? |
And that's good for you. But we're aiming for a solution that scales from small to big and doing so with sane defaults. Adding "enable this switch here and it makes stuff faster but also much more insecure" is not an option. You can of course write an optional app that does this, but it won't be part of the core server for now :)
https://github.com/mkoppanen/imagick#security has some more infos. https://imagetragick.com was a kinda big exploit. We had imagemagick until like 2013 for some previews enabled by default but this allowed an attacker to extract arbitrary files from the system. The only sane way to run imagemagick is in a sandboxed, secured, environment. Something we don't offer out of the box yet and also would be an additional component. (e.g. a preview server system) Not saying that we shouldn't speed up thumbnails. We should. But enabling imagemagick on the webserver is a very bad idea from a security PoV. |
@LukasReschke All valid concerns which I agree with, security should come foremost, then speed. I dont think the current solution scales even to small use cases, as it is too slow. The server is on 32gb of ram, storage on 2 raid0 ssds, 12core cpu, and yet it stutters when web browsing nextcloud on it, and Android phone (quad core, tested with a s7 as well) on the same WiFi - loads the picture preview very visibily at 1 per second. Doesnt matter if the folder has 10 pictures or 1000, they are previewed serially (especially on Android), slowly. (Yes opcache, apcu, lighttpd set to maximum performance and caching were possible, php-fpm dynamic with many workers) The expectation is for it to to be as fast as the inbuilt android-gallery, as is the case when using google drive. We can make that happen. I like the preview-server idea, how would the communication happen between nextcloud/php and preview-server be? Unix sockets and passing file-descriptor? Because the speed advantage is gone the second we try to stream the whole file to any preview-server, when it can do better by just reading parts of it intelligently (skipping rows and columns of pixels). By writing/passing file-descriptors to unix-socket, the other process can live isolated in its own container without access to our filesystem. Otherwise more simply we could proc_open("/bin/some-server", $descriptors, $yadayada) and let child have the file-descriptor, and prevent that process from accessing nextcloud_data by normal unix-permissions, that preview-process could be setuid a preview_user. Simpler than unix-sockets? And if people want more security - let the some-server talk to a unix socket to another container which is running imagemagick/gd under seccomp mode. Would such a preview-server be useful for libreoffice as well? |
Why not give the administrator the choice which image processing library they want to use you can recommend the one you want. Get on my first installation There is a very good abstraction here https://github.com/avalanche123/Imagine which is a very wide used library. |
It's probably worth pointing out for anyone reading this later that an app for preview generation does already exist here. Not sure how much this meets your usecase @iskradelta? |
It seems to me that thumbnails are created for each user individually (nextcloud/data/username/thumbnails), even for shared data (a big photo collection in my case). That looks like a waste of space and cpu time to me. I can combat the first with filesystem deduplication, not sure about the other part. |
Making the preview system more modular is still on the TODO list. That way the problems by @iskradelta @alexander-schranz should be fixed since then you can enable the app you want to generate previews @Klaas- since NC11 we do share previews. However note that the gallery app is not yet modified. This will come soon (tm) I hope. |
thanks @rullzer linking your pr for others to find it nextcloud/gallery#187 |
@Bugsban thanks will try the app later, I "fixed" it by writing a short bash script with curl to force the auto-generation of previews. @Klaas- @rullzer Yes, it did take much more space. Then I looked into using file-descriptor passing for a more secure preview-system in general, and it would let me decide which process is actually processing a a file, so I could use graphicsmagick instead of php-gd. My tests showed gmagick could do the thumbnails so fast it would be not worth wasting space on pregenerating previews, while php-gd could not - gmagick has an optimization to skip reading the whole file and construct a smaller one, and some other options. For security reasons, fd-passing, since then I could run gmagick in another lxc-container with its own seccomp filters adjusted for gmagick, and the process gmagick in that container would not even have file-system access to anything else on the server, in fact, it would only have access to file-descriptors passed to it over a unix socket which would be mounted in both nextcloud-cointaner and gmagick-container. So if anyone exploits/gets-the EIP in gmagick process say with a bad .jpg all harm it could do is write a bad thumbnail. |
Yet two days I'm searching for a solution to this "slow previews" and "slow thumbnails" Found this tool you mentioned, but it actually don't generate the thumbnails for the gallery app and, the photo previews are still looking bad and very slow to generate, it takes more time to generate the previews than downloading the full image ! If at least we had an option to disable compressing the previews. This issue is open more than 3 years, saw topics on forums from 2013 mentioning these problems, and now in 2017 still these slow previews and image generation, certainly because of GD. I can't afford a 80 dollar dedicated server in month for my photos. Isn't any way to change the whole thing ? Even the UI is very slow for me at certain times.. |
@AngeloFrangione lol even a $100 dollar dedicated server wont solve the issue. Mine is on a 6-core i7, 32GB memory, 4 SSDs in RAID0, 1Gbit ethernet, and still... as you say, just viewing the picture is faster than waiting for gd to generate preview. My measurements showed the problem is in gd, and a solution is in using graphicsmagick because it can be told to not read the whole image but only the parts it needs to generate a preview - and that works. |
I would like to show my support for GraphicsMagick. Although only in the event that the it's implemented securely. GM is used to process billions of files at the world's largest photo sites (e.g. Flickr and Etsy). |
This issue is also a pain for me and I am highly interested about the current state of it. I guess the hint of @LexiconCode with GraphicsMagick looks quite promising! |
Same here. I definitely like the gallery view - when all images are available to preview. So, there are two different ways to solve it but doing both would also be nice:
If there's anything I can do here to help, ping me. 👍 |
Same here. Please when can we get it on the snap for Raspi 2? |
Replying to @srkunze Even of the Preview Generator App handled the gallery I don't believe it would help. Correct me if I'm wrong (not familiar with Nextcloud code). I don't believe (as of right now *) you can multi-thread a PHP script. So if you had a PHP script do generation, only one CPU thread could work on it correct? Would writing something outside of PHP be better? Maybe Python has some image-processing libraries that would be more efficient and, with multi-threading support, scale better. * I guess pthreads are a thing in PHP 7.2 |
@timtierney Maybe the GIL prevents the multi-threading idea in Python. Why do you think modifying the Generator app could not handle it appropriately? I works well for the Android app using thumbnails. |
@srkunze I think it could do it. But would it scale on machines with more cores/threads? |
@timtierney multi-threading should be possible if a cron job is used for thumbnail generation, right? this would only require, that the occ command for thumbnail generation supports generation of a subset of thumbnails. then the occ command could be invoked multiple times by the cron job. |
Could the thumbnail generator spawn a certain number of workers processes to generate these thumbnails. Thumbnail generation in Dropbox is very much faster then Nextcloud at the moment, it would be nice to get close in terms of speed. Nextcloud is already better in terms of quality. |
Any news on this? The slow generation of thumbnails is the only thing holding me back from using NextCloud. ): |
Installing previewgenerator and adding a cron job helps the user experience most of the time. I like the solution proposed by @alexander-schranz, use Imagine as a layer of abstraction, and eventually let the user choose with warning about security (and extra dependencies). It seems no preview system is secure:
Nextcloud team already made that clear. |
Could it be shifted to client side? |
I have rewritten image.php to use the Imagine interface instead of the raw GD functions. For this to work we need to add an additional dependency to imagine into nextcloud. @iskradelta do you have the gmagick flag to optimise and do preview on the fly? Would be interesting to find out. I am not sure the Imagine interface by default is optimised for speed. |
Hey @Darkvater, how do you set up your modification ? What is the config setting ? |
@Mikescops I have to patch image.php as in the commit but also add the imagine dependency which I am not fully sure about how it works on a master checkout (keeps doing infinite loops with the VCS in 3rdparty/composer.json). So it's definitely not ready yet for a pull. |
Is there somewhere we can follow this? Has anyone outlined what it should look like if someone happens to have time to invest to handle it? Also regarding alternatives, ImageFlow is another. It's still a bit immature but it's designed specifically for user-supplied images and the security concerns that entails. |
Also @Darkvater:
Do you mean that you should read an existing preview to generate a smaller preview? I don't know the code very well but perhaps you could modify Or do you mean that Imagine has the ability to open an existing large preview and subsample it or something if you tell it the file size you want output? I think the solution here might be to fiddle with the image preview code rather than the low level image manipulation code anyhow. It should even be possible to do that as an app (like this, for RAW files) so I'll have a look and see if I can make it work. |
@bobobo1618 |
Ah, I didn't see that call in the Generator. Taking that into account, my proposal is to add two optional methods to the
My rationale for the first method is that we can't just use the existing method because it doesn't support cropping, which is something we'll want to support. If we don't, then we'll have to deal with the performance hit of GD decoding, cropping and re-encoding the larger preview. My rationale for the second method is that I think there's reason a user/developer might want to generate the preview from the original file. For example a photographer might care a lot about quality and not want to incur the degradation of two JPEG encodes or a RAW preview app developer might want to generate a full size preview by rendering the content of the RAW file but might want to use embedded previews for the smaller versions.
Once that's done this should all be modular and an app can do everything the user wants. |
my nextcloud vm has been generating previews for 15 days non stop and by now has only generated previews for 15k out of 35k images. If this product was designed to scale i must be doing something wrong because i cant imagine someone with 500 thousand images. vm specs: xeon 1260l 4gig ram |
Any news/official rationale on this? Honestly, this is a serious dealbreaker. |
Also transmission of the wrong thumbnail size #13709 (comment) I feel like the whole system is doing a lot of 'assuming' about how people are going to care.
|
Any news on this? |
Ok, came back almost 3 years later and thumbnail generation is almost in the same state. For me this feature is a must. I think it would be nice to have a simple GUI to generate and keep track of generated images/videos and increase speed generating those thumbnails. |
My assumption is that even the community around nextcloud seems big, the developing community is rather small. And the paid developer focus is enterprise. For them, the community is a playground well suited to test things out and gain additional test ressources. But do they play a role for the road map? I doubt it. Do you think business users collect huge amount of photos in gallery? Right, thats why three years no move here. |
There's enough interest where people might want to donate toward sponsoring this issue directly. I'd be willing only if it went towards this issue. |
Can someone who have a clear undersanding of Nextcloud and the issue can summarize where to look and what must be done (for someone who never contributed to Nextcloud)? If contributions are opened, I can definitely take a look at this. |
This is just another of nextclouds dirty little secrets, they will never resolve it, they have some bizzare branch of the code where they want to make the fork for processing each 'collection' of thumbnails happen in a single PHP request, but the whole point of PHP+Apache/Nginx is so that each process/request is a new fork anyway To further push my point, a single request to the nextcloud stack, not including actually doing anything is about 200ms..... so my first paragraph above is about working around that crappy codebase... but the whole nextcloud stack is just a 'work around' and they only respond to their larger commercial interests, do not bother, find alternatives. |
@dgtlmoon good thing about internet is that you're free to go for another project or build your own. Nextcloud company owes you nothing but they do for their customers. |
@Mikescops actually I've donated months of my own life looking into why this is an issue, profiling the entire codestack and getting to the bottom of the issue, I've helped with a few merge requests and commented on many others, spent a lot of time profiling and code reviewing, then spending more time contacting the main contributors here and trying to find a way forwards but years later we're still in the same situation. So please, you know where you can stick your smart little comment ^^ And look here... here's yourself a few years ago just complaining and adding nothing... |
@Mikescops since you are a part of nextcloud, instead of attacking people that contribute their own time to exploring issues, why dont you comment about the silly 200ms round trip time just to bootstrap the stack? (Found through donating my own time) or the silly idea/concept to make PHP do the forking for multiple thumbnail requests? (Found through donating my own time) |
@dgtlmoon I'm not part of Nextcloud team, I'm just a user that since my first comment you saw decided to help and fixed issues that matters to me. It's not my daily job at all. |
Nextcloud gmbh is not an engineering company, but a B2B sales / marketing company. Just look at where the effort is being spent. The community should move back to owncloud, which has spent considerable effort fixing their tech stack to a (micro) service-based architecture with API and SPA js gui on top. |
Closing as per #24166 |
Steps to reproduce
Expected behaviour
The previews should be generated in less than 0.3seconds, preferrably faster.
It should be possible to pre-generate the previews/thumbnails for recently uploaded photos or by running a cron job. To avoid waiting for any previews.
No external storage or weird things is used.
Actual behaviour
Slow as a nintento 8bit running openssl dhparam generation.
Because php-gd seems to load the image into memory, then it does imagecopyresampled which takes time on big JPEGs, also it only uses 1 process for the computations.
ImageMagick on the command line also takes everything from 1 to 5s depending on parameters used to scale/resize down the image, but with -define jpeg:size parameter, telling ImageMagick to not load the whole image into memory, but instead request smaller parts of it while resizing it, takes 0.2seconds on average.
Should nextcloud switch to ImageMagick and redo/later-throw-away legacy/image.php ?
Perhaps making a NC_Image class which would use imagemagick either by shell_exec or if php has the imagick extension loaded use that. I think so, yes?
There could be a setting in the admin interface "Pre-generate thumbnails for recently uploaded/scanned content?" This setting could just downsize really big content say > 1mb into something small like 384x384 - and if other clients want even smaller image they can resize from the small part again ondemand, but that then would be much faster.
Would a patch for this be accepted?
Server configuration
Operating system: GNU/Linux
Web server: lighttpd
Database: sqlite3
PHP version: PHP 5.6.21, opcache apcu enabled, php-fpm.
Nextcloud version: 10.something.
Updated from an older Nextcloud/ownCloud or fresh install:
Where did you install Nextcloud from: the nextcloud website.
The text was updated successfully, but these errors were encountered: