Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thumbnail/Preview image generation is too slow #1732

Closed
iskradelta opened this issue Oct 13, 2016 · 58 comments
Closed

Thumbnail/Preview image generation is too slow #1732

iskradelta opened this issue Oct 13, 2016 · 58 comments

Comments

@iskradelta
Copy link

Steps to reproduce

  1. Use Nextcloud android app, or the web interface to view a folder list containing many JPEG pictures taken by a camera
  2. Notice the slowness in loading previews, about a second for each picture, web interface is faster since it sends more requests at same time
  3. Notice in the logs that it takes about 1.1second to scale down an image of about 5MB to 384x384.

Expected behaviour

The previews should be generated in less than 0.3seconds, preferrably faster.
It should be possible to pre-generate the previews/thumbnails for recently uploaded photos or by running a cron job. To avoid waiting for any previews.
No external storage or weird things is used.

Actual behaviour

Slow as a nintento 8bit running openssl dhparam generation.

Because php-gd seems to load the image into memory, then it does imagecopyresampled which takes time on big JPEGs, also it only uses 1 process for the computations.

ImageMagick on the command line also takes everything from 1 to 5s depending on parameters used to scale/resize down the image, but with -define jpeg:size parameter, telling ImageMagick to not load the whole image into memory, but instead request smaller parts of it while resizing it, takes 0.2seconds on average.

Should nextcloud switch to ImageMagick and redo/later-throw-away legacy/image.php ?
Perhaps making a NC_Image class which would use imagemagick either by shell_exec or if php has the imagick extension loaded use that. I think so, yes?

There could be a setting in the admin interface "Pre-generate thumbnails for recently uploaded/scanned content?" This setting could just downsize really big content say > 1mb into something small like 384x384 - and if other clients want even smaller image they can resize from the small part again ondemand, but that then would be much faster.

Would a patch for this be accepted?

Server configuration

Operating system: GNU/Linux

Web server: lighttpd

Database: sqlite3

PHP version: PHP 5.6.21, opcache apcu enabled, php-fpm.

Nextcloud version: 10.something.

Updated from an older Nextcloud/ownCloud or fresh install:

Where did you install Nextcloud from: the nextcloud website.

@LukasReschke
Copy link
Member

Should nextcloud switch to ImageMagick and redo/later-throw-away legacy/image.php ?
Perhaps making a NC_Image class which would use imagemagick either by shell_exec or if php has the imagick extension loaded use that. I think so, yes?

No. We won't use imagemagick as that one isn't really secure to use with untrusted user input.

There could be a setting in the admin interface "Pre-generate thumbnails for recently uploaded/scanned content?" This setting could just downsize really big content say > 1mb into something small like 384x384 - and if other clients want even smaller image they can resize from the small part again ondemand, but that then would be much faster.

Why a setting and not making this default?

cc @rullzer Thoughts?

@rullzer
Copy link
Member

rullzer commented Oct 13, 2016

We already do this. The first time it is requested we generate a 2048x2048 (default) image. And from this one all smaller thumbnails are derived. We use 2048x2048 since else thing look blurry on your laptop screen.

I'm still working on making our preview system more efficient. And something to create previews in cron is still on the list.

@iskradelta
Copy link
Author

@LukasReschke ImageMagick and security: Shouldnt the choice of image-backend at least be a choice for the user? I run Nextcloud on my server and feed it my images and documents - how is any of that untrusted input? Not everyone will run nextcloud as an enterprise renting it out to otters I think. btw I am curious about security of imagemagick, have any pointers? Is there any other faster alternative which is also secure?

@rullzer I see. But it is still too slow, even with 2048x2048 gd imagecopyresample spends 0.8s for every image. Can gd be speeded up? Can it be told to use more threads or processes?

@LukasReschke
Copy link
Member

LukasReschke commented Oct 13, 2016

@LukasReschke ImageMagick and security: Shouldnt the choice of image-backend at least be a choice for the user? I run Nextcloud on my server and feed it my images and documents - how is any of that untrusted input?

And that's good for you. But we're aiming for a solution that scales from small to big and doing so with sane defaults. Adding "enable this switch here and it makes stuff faster but also much more insecure" is not an option. You can of course write an optional app that does this, but it won't be part of the core server for now :)

btw I am curious about security of imagemagick, have any pointers?

https://github.com/mkoppanen/imagick#security has some more infos. https://imagetragick.com was a kinda big exploit. We had imagemagick until like 2013 for some previews enabled by default but this allowed an attacker to extract arbitrary files from the system.

The only sane way to run imagemagick is in a sandboxed, secured, environment. Something we don't offer out of the box yet and also would be an additional component. (e.g. a preview server system)

Not saying that we shouldn't speed up thumbnails. We should. But enabling imagemagick on the webserver is a very bad idea from a security PoV.

@iskradelta
Copy link
Author

@LukasReschke All valid concerns which I agree with, security should come foremost, then speed.

I dont think the current solution scales even to small use cases, as it is too slow. The server is on 32gb of ram, storage on 2 raid0 ssds, 12core cpu, and yet it stutters when web browsing nextcloud on it, and Android phone (quad core, tested with a s7 as well) on the same WiFi - loads the picture preview very visibily at 1 per second. Doesnt matter if the folder has 10 pictures or 1000, they are previewed serially (especially on Android), slowly. (Yes opcache, apcu, lighttpd set to maximum performance and caching were possible, php-fpm dynamic with many workers)

The expectation is for it to to be as fast as the inbuilt android-gallery, as is the case when using google drive. We can make that happen.

I like the preview-server idea, how would the communication happen between nextcloud/php and preview-server be? Unix sockets and passing file-descriptor? Because the speed advantage is gone the second we try to stream the whole file to any preview-server, when it can do better by just reading parts of it intelligently (skipping rows and columns of pixels). By writing/passing file-descriptors to unix-socket, the other process can live isolated in its own container without access to our filesystem.

Otherwise more simply we could proc_open("/bin/some-server", $descriptors, $yadayada) and let child have the file-descriptor, and prevent that process from accessing nextcloud_data by normal unix-permissions, that preview-process could be setuid a preview_user. Simpler than unix-sockets? And if people want more security - let the some-server talk to a unix socket to another container which is running imagemagick/gd under seccomp mode.

Would such a preview-server be useful for libreoffice as well?

@alexander-schranz
Copy link

Why not give the administrator the choice which image processing library they want to use you can recommend the one you want. Get on my first installation Module not found gd and I have imagick and gd2 enabled.

There is a very good abstraction here https://github.com/avalanche123/Imagine which is a very wide used library.

@Bugsbane
Copy link
Member

It's probably worth pointing out for anyone reading this later that an app for preview generation does already exist here. Not sure how much this meets your usecase @iskradelta?

@Klaas-
Copy link

Klaas- commented Jan 4, 2017

It seems to me that thumbnails are created for each user individually (nextcloud/data/username/thumbnails), even for shared data (a big photo collection in my case). That looks like a waste of space and cpu time to me. I can combat the first with filesystem deduplication, not sure about the other part.

@rullzer
Copy link
Member

rullzer commented Jan 4, 2017

Making the preview system more modular is still on the TODO list. That way the problems by @iskradelta @alexander-schranz should be fixed since then you can enable the app you want to generate previews

@Klaas- since NC11 we do share previews. However note that the gallery app is not yet modified. This will come soon (tm) I hope.

@Klaas-
Copy link

Klaas- commented Jan 4, 2017

thanks @rullzer linking your pr for others to find it nextcloud/gallery#187

@iskradelta
Copy link
Author

@Bugsban thanks will try the app later, I "fixed" it by writing a short bash script with curl to force the auto-generation of previews.

@Klaas- @rullzer Yes, it did take much more space.

Then I looked into using file-descriptor passing for a more secure preview-system in general, and it would let me decide which process is actually processing a a file, so I could use graphicsmagick instead of php-gd. My tests showed gmagick could do the thumbnails so fast it would be not worth wasting space on pregenerating previews, while php-gd could not - gmagick has an optimization to skip reading the whole file and construct a smaller one, and some other options. For security reasons, fd-passing, since then I could run gmagick in another lxc-container with its own seccomp filters adjusted for gmagick, and the process gmagick in that container would not even have file-system access to anything else on the server, in fact, it would only have access to file-descriptors passed to it over a unix socket which would be mounted in both nextcloud-cointaner and gmagick-container. So if anyone exploits/gets-the EIP in gmagick process say with a bad .jpg all harm it could do is write a bad thumbnail.

@AngeloFrangione
Copy link

AngeloFrangione commented Jan 17, 2017

Yet two days I'm searching for a solution to this "slow previews" and "slow thumbnails" Found this tool you mentioned, but it actually don't generate the thumbnails for the gallery app and, the photo previews are still looking bad and very slow to generate, it takes more time to generate the previews than downloading the full image ! If at least we had an option to disable compressing the previews.

This issue is open more than 3 years, saw topics on forums from 2013 mentioning these problems, and now in 2017 still these slow previews and image generation, certainly because of GD. I can't afford a 80 dollar dedicated server in month for my photos. Isn't any way to change the whole thing ? Even the UI is very slow for me at certain times..

@iskradelta
Copy link
Author

@AngeloFrangione lol even a $100 dollar dedicated server wont solve the issue. Mine is on a 6-core i7, 32GB memory, 4 SSDs in RAID0, 1Gbit ethernet, and still... as you say, just viewing the picture is faster than waiting for gd to generate preview. My measurements showed the problem is in gd, and a solution is in using graphicsmagick because it can be told to not read the whole image but only the parts it needs to generate a preview - and that works.

@LexiconCode
Copy link

LexiconCode commented Mar 15, 2017

I would like to show my support for GraphicsMagick. Although only in the event that the it's implemented securely.

GM is used to process billions of files at the world's largest photo sites (e.g. Flickr and Etsy).

@godfuture
Copy link

This issue is also a pain for me and I am highly interested about the current state of it. I guess the hint of @LexiconCode with GraphicsMagick looks quite promising!

@srkunze
Copy link
Member

srkunze commented Aug 17, 2017

Same here. I definitely like the gallery view - when all images are available to preview.
I also used the Preview Generator App, but it only generates the smallest size for the list view.

So, there are two different ways to solve it but doing both would also be nice:

  1. Preview Generator App - pre-generate gallery images as well as thumbnails
  2. Speed up gallery images considerably

If there's anything I can do here to help, ping me. 👍

@abefroman6868
Copy link

Same here.
The gallery view was the main reason to buy the Nextcloud device and I reported the issue I guess in Dec. last year. The NC forum shows, that it is a concern to many users. Now, I have to open the webbrowser and wait hours(!) until the previews are created, not talking about sharing with others immediately after upload. It managed to create 60 previews in one day on NC11 snap.
So I guess the preview Generator app would be a good solution, if it performs well.

Please when can we get it on the snap for Raspi 2?

@timtierney
Copy link

Replying to @srkunze

Even of the Preview Generator App handled the gallery I don't believe it would help. Correct me if I'm wrong (not familiar with Nextcloud code). I don't believe (as of right now *) you can multi-thread a PHP script. So if you had a PHP script do generation, only one CPU thread could work on it correct?

Would writing something outside of PHP be better? Maybe Python has some image-processing libraries that would be more efficient and, with multi-threading support, scale better.

* I guess pthreads are a thing in PHP 7.2

@srkunze
Copy link
Member

srkunze commented Sep 18, 2017

@timtierney Maybe the GIL prevents the multi-threading idea in Python. Why do you think modifying the Generator app could not handle it appropriately? I works well for the Android app using thumbnails.

@timtierney
Copy link

timtierney commented Sep 18, 2017

@srkunze I think it could do it. But would it scale on machines with more cores/threads?

@wiwie
Copy link

wiwie commented Dec 5, 2017

@timtierney multi-threading should be possible if a cron job is used for thumbnail generation, right? this would only require, that the occ command for thumbnail generation supports generation of a subset of thumbnails. then the occ command could be invoked multiple times by the cron job.

@rubrik
Copy link

rubrik commented Jan 12, 2018

Could the thumbnail generator spawn a certain number of workers processes to generate these thumbnails. Thumbnail generation in Dropbox is very much faster then Nextcloud at the moment, it would be nice to get close in terms of speed. Nextcloud is already better in terms of quality.

@Hermsi1337
Copy link

Any news on this? The slow generation of thumbnails is the only thing holding me back from using NextCloud. ):

@adriengibrat
Copy link

Installing previewgenerator and adding a cron job helps the user experience most of the time.
It's just a workaround as the image generation is still very slow, just pregenerated and cached.

I like the solution proposed by @alexander-schranz, use Imagine as a layer of abstraction, and eventually let the user choose with warning about security (and extra dependencies).

It seems no preview system is secure:

Nextcloud team already made that clear.

@LexiconCode
Copy link

Could it be shifted to client side?

@Darkvater
Copy link

Darkvater commented Mar 11, 2018

I have rewritten image.php to use the Imagine interface instead of the raw GD functions. For this to work we need to add an additional dependency to imagine into nextcloud.
While there should be no functional changes in my patch I have added a config setting to allow to select the Imagine interface, be it GD, Imagick, Gmagick or Vips in Darkvater@d06b2bf

@iskradelta do you have the gmagick flag to optimise and do preview on the fly? Would be interesting to find out. I am not sure the Imagine interface by default is optimised for speed.
Update: found it, you have to give it a size before opening the image so that it does a "progressive" read. Let's see if I can reproduce this with imagine, by default it's not fast at all.

@Mikescops
Copy link
Member

Hey @Darkvater, how do you set up your modification ? What is the config setting ?

@Darkvater
Copy link

@Mikescops I have to patch image.php as in the commit but also add the imagine dependency which I am not fully sure about how it works on a master checkout (keeps doing infinite loops with the VCS in 3rdparty/composer.json). So it's definitely not ready yet for a pull.

@bobobo1618
Copy link

Making the preview system more modular is still on the TODO list

Is there somewhere we can follow this? Has anyone outlined what it should look like if someone happens to have time to invest to handle it?

Also regarding alternatives, ImageFlow is another. It's still a bit immature but it's designed specifically for user-supplied images and the security concerns that entails.

@bobobo1618
Copy link

Also @Darkvater:

The reason for this is that the file needs to be opened in preview mode (eg. specifying a smaller size) to have full advantage of the speedup.

Do you mean that you should read an existing preview to generate a smaller preview? I don't know the code very well but perhaps you could modify getThumbnail in the image preview code to do that by calling itself to fetch a large preview when the small preview is requested? It should either generate it or read it from the cache.

Or do you mean that Imagine has the ability to open an existing large preview and subsample it or something if you tell it the file size you want output?

I think the solution here might be to fiddle with the image preview code rather than the low level image manipulation code anyhow. It should even be possible to do that as an app (like this, for RAW files) so I'll have a look and see if I can make it work.

@BatPio
Copy link

BatPio commented Sep 3, 2018

@bobobo1618
I am afraid that IProvider is used only to generate the largest size of the preview. Smaller preview sizes are generated from larger ones by /lib/private/legacy/image.php

@bobobo1618
Copy link

Ah, I didn't see that call in the Generator.

Taking that into account, my proposal is to add two optional methods to the Provider interface:

  • generatePreview, which implements the logic mirroring the current implementation.
  • generatePreviewFromOriginal, which implements the above logic but reading from the original.

My rationale for the first method is that we can't just use the existing method because it doesn't support cropping, which is something we'll want to support. If we don't, then we'll have to deal with the performance hit of GD decoding, cropping and re-encoding the larger preview.

My rationale for the second method is that I think there's reason a user/developer might want to generate the preview from the original file. For example a photographer might care a lot about quality and not want to incur the degradation of two JPEG encodes or a RAW preview app developer might want to generate a full size preview by rendering the content of the RAW file but might want to use embedded previews for the smaller versions.

getPreview in Generator can then be altered to check for generatePreviewFromOriginal, use that if it's there else check for generatePreview else, if cropping isn't needed, use getThumbnail and finally, fall back to the existing logic.

Once that's done this should all be modular and an app can do everything the user wants.

@master0044657
Copy link

master0044657 commented Sep 27, 2018

my nextcloud vm has been generating previews for 15 days non stop and by now has only generated previews for 15k out of 35k images. If this product was designed to scale i must be doing something wrong because i cant imagine someone with 500 thousand images.

vm specs: xeon 1260l 4gig ram

@clawoflight
Copy link

Any news/official rationale on this? Honestly, this is a serious dealbreaker.

@dgtlmoon
Copy link

dgtlmoon commented Nov 16, 2019

Also transmission of the wrong thumbnail size #13709 (comment)

I feel like the whole system is doing a lot of 'assuming' about how people are going to care.

  • Generating thumbs that arent used (why does it need to generate it now, and not on request later? people create a lot more content than they actually browser, this pattern is the same in drupal and wordpress)

  • Generating huge thumbs that arent used

  • Not utilising EXISTING thumbnail data such as whats stored in the EXIF data stream

@skjnldsv skjnldsv added 0. Needs triage Pending check for reproducibility or if it fits our roadmap 1. to develop Accepted and waiting to be taken care of and removed 0. Needs triage Pending check for reproducibility or if it fits our roadmap labels Aug 20, 2020
@RomainFallet
Copy link

Any news on this?

@master0044657
Copy link

my nextcloud vm has been generating previews for 15 days non stop and by now has only generated previews for 15k out of 35k images. If this product was designed to scale i must be doing something wrong because i cant imagine someone with 500 thousand images.

vm specs: xeon 1260l 4gig ram

Ok, came back almost 3 years later and thumbnail generation is almost in the same state. For me this feature is a must. I think it would be nice to have a simple GUI to generate and keep track of generated images/videos and increase speed generating those thumbnails.

@godfuture
Copy link

Ok, came back almost 3 years later and thumbnail generation is almost in the same state. For me this feature is a must.

My assumption is that even the community around nextcloud seems big, the developing community is rather small. And the paid developer focus is enterprise. For them, the community is a playground well suited to test things out and gain additional test ressources. But do they play a role for the road map? I doubt it. Do you think business users collect huge amount of photos in gallery? Right, thats why three years no move here.

@LexiconCode
Copy link

There's enough interest where people might want to donate toward sponsoring this issue directly. I'd be willing only if it went towards this issue.

@RomainFallet
Copy link

Can someone who have a clear undersanding of Nextcloud and the issue can summarize where to look and what must be done (for someone who never contributed to Nextcloud)?

If contributions are opened, I can definitely take a look at this.

@dgtlmoon
Copy link

This is just another of nextclouds dirty little secrets, they will never resolve it, they have some bizzare branch of the code where they want to make the fork for processing each 'collection' of thumbnails happen in a single PHP request, but the whole point of PHP+Apache/Nginx is so that each process/request is a new fork anyway

To further push my point, a single request to the nextcloud stack, not including actually doing anything is about 200ms.....

so my first paragraph above is about working around that crappy codebase... but the whole nextcloud stack is just a 'work around' and they only respond to their larger commercial interests, do not bother, find alternatives.

@Mikescops
Copy link
Member

@dgtlmoon good thing about internet is that you're free to go for another project or build your own. Nextcloud company owes you nothing but they do for their customers.
This thread is just about frustrated and complaining people that doesn't seem to be interested in contributing in the product at all.

@dgtlmoon
Copy link

dgtlmoon commented Feb 22, 2021

@Mikescops actually I've donated months of my own life looking into why this is an issue, profiling the entire codestack and getting to the bottom of the issue, I've helped with a few merge requests and commented on many others, spent a lot of time profiling and code reviewing, then spending more time contacting the main contributors here and trying to find a way forwards but years later we're still in the same situation.

So please, you know where you can stick your smart little comment ^^

image

And look here... here's yourself a few years ago just complaining and adding nothing...

image

@dgtlmoon
Copy link

@Mikescops since you are a part of nextcloud, instead of attacking people that contribute their own time to exploring issues, why dont you comment about the silly 200ms round trip time just to bootstrap the stack? (Found through donating my own time)

or the silly idea/concept to make PHP do the forking for multiple thumbnail requests? (Found through donating my own time)

@Mikescops
Copy link
Member

@dgtlmoon I'm not part of Nextcloud team, I'm just a user that since my first comment you saw decided to help and fixed issues that matters to me. It's not my daily job at all.

@clawoflight
Copy link

Nextcloud gmbh is not an engineering company, but a B2B sales / marketing company. Just look at where the effort is being spent.

The community should move back to owncloud, which has spent considerable effort fixing their tech stack to a (micro) service-based architecture with API and SPA js gui on top.

@solracsf
Copy link
Member

Closing as per #24166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests