Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Multithreading #162

Open
jkingcis opened this issue May 6, 2019 · 12 comments
Open

Enhancement: Multithreading #162

jkingcis opened this issue May 6, 2019 · 12 comments

Comments

@jkingcis
Copy link

jkingcis commented May 6, 2019

Hi,

Just being using preview generator for a while. Congrats for your excellent work. I've noticed that the occ process itself only uses one vCPU at a time. Considering that the task of generating previews is quite CPU-intensive, it might be interesting to get multithreading working, so the process could be much faster.

nc_occ_monothread

@ghost
Copy link

ghost commented Jun 29, 2019

@jkingcis
Copy link
Author

jkingcis commented Jul 1, 2019

Hi @Gatak

Thank you so much for sharing the link. I've found this to be a nice and interesting PoC. I'm going to hit the author to get further details and to ask him to open an issue here at the official branch of Preview Generator app, so devs can check if his contrub is worth to be merged with the current project, which I hope it will, as the results seems quite promising.

I'm currently creating the thumbs via cron job with occ preview:generate-all command, but as he spots, this creates a bunch of different-sized thumbs on a monothread basis, which is painfully slow if you have an active community uploading pics everytime, as most of us have.

I'll let you know how the whole thing develops

@ghost
Copy link

ghost commented Jul 1, 2019

@jkingcis You're right. I think many many users would really benefit from this work. Especially on low-end devices such as routers, Raspberry Pi's etc.

I was thinking that perhaps it would be possible to create a bash script to generate the thumbs outside of Nextcloud and then use occ files:scan-app-data to add them to the database. I think a lot of the slow downs are not the generation of images, but rather the hundreds of SQL searches and file locks happening for each single file. This makes the previewgenerator (and NC) very slow.

What would be needed to create the thumbs? I In the appdata_xxxx/preview/ folder there is a sub-folder named with the fileid that matches the picture the thumbnail is created for. The question would be how to add the proper lines in the database for these files. For example parent and name columns?

Would NC create the DB entries automatically when running occ files:scan-app-data?

This is an extract from oc_filecache:

fileid storage path path_hash parent name mimetype mimepart size mtime storage_mtime encrypted unencrypted_size etag permissions
167195 2 appdata_oc56k9291p0f/preview/166383 88322374d16ad83364b98f98c646e9dc 25077 166383 2 1 455365 1561851909 1561851909 0 0 5d17f5e90d01d 31
167215 2 appdata_oc56k9291p0f/preview/166383/1600-1200-max.jpg 89428ef80eae90a74b655be972d8f4f5 167195 1600-1200-max.jpg 8 7 290927 1561851409 1561851409 0 0 9c8f86f0703925159d911a3a997e1880 27
167220 2 appdata_oc56k9291p0f/preview/166383/1024-1024-crop.jpg f21923ef2bb3caf0f0933148cde73013 167195 1024-1024-crop.jpg 8 7 147664 1561851412 1561851412 0 0 0ba9eeddc90fe6b92952c302019225e9 27
168224 2 appdata_oc56k9291p0f/preview/166383/341-256.jpg 01ae8d87e65fefd5958e737a71471f8d 167195 341-256.jpg 8 7 16774 1561851909 1561851909 0 0 6cc03b4fa232f7b447fce48922719e41 27
166383 2 __groupfolders/2/Photos/20180107_114047.jpg 6d1caed55edac07521e4082bb2da282b 163225 20180107_114047.jpg 8 7 4454565 1556087710 1556087710 0 0 08426df32bea1f19404f42bcffbca179 27

@nachoparker
Copy link
Member

nachoparker commented Jul 1, 2019

@jkingcis #166

@Gatak that would indeed be a bit hacky but it could be a bit faster. If you see my benchmarks, even using imagick with many threads per resize the bottleneck is in the PHP-db part so it speeds things up a bit but if we use the "multiprocess approach" (PR #166) then it is way faster.

I think your suggestion could be even faster if it were implemented say in C but it would require more maintenance to be kept in sync with potential changes in the NC previews engine, more programming effort, and harder to distribute (since is compiled) and not that much faster. Probably not worth it IMO.

@ghost
Copy link

ghost commented Jul 2, 2019

@nachoparker Absolutely not an easy task this. I do not have much coding experience, but I'm thinking to avoid the contention with mysql/mariadb/php, that the preview files could be generated with a imagemagick bash script, and then one proceeds to updated the database with all the files afterwards.

@nachoparker
Copy link
Member

nachoparker commented Jul 2, 2019

Said script would need to be aware of what previews already exist, also what is the configured jpeg quality and so on. Before you know it we are replicating the functionality from the NC previews engine and then we need to keep it in sync.

It is a good idea, but I think we would be better off trying to improve or provide more options in the NC / previewgen code itself.

@jkingcis
Copy link
Author

jkingcis commented Jul 3, 2019

@nachoparker I agree. Creating an external Imagemagick script would need at least some database queries to get which thumbs have been already created, which would somewhat slow the things anyway. I think your multithreading/concurrent instances proposed in #166 will bring a huge timing improvement. Hope it reaches the official branch.

In my company, I did a Bash script to check if another preview generation process was still running (because it'd run via cron job) to avoid launching a second process that could create inconsistency, and then run the prev generation itself every 10 mins. Because nowadays preview generation is painfully slow, the problem gets enhanced as the server is always generating previews as users upload pictures faster than the server is able to render them (not many pics imo, maybe 20-50 per hour).

@nachoparker
Copy link
Member

Before my changes it was not possible to run two instances anyway (the second one detects this and doesn't run).

You can have a look at how we do it in NextCloudPi. 1h every night so it doesn't affect normal usage.

@vesperkasper
Copy link

Hi has this topic made it to the dev?

@ftrentini
Copy link

What happens if I run like 5 instances of preview:generate_all, each for a different user?
It will use other threads, right? Will the app be able to handle it? Overnight Test Time!!! 😄

@voklav
Copy link

voklav commented Jan 10, 2023

What happens if I run like 5 instances of preview:generate_all, each for a different user? It will use other threads, right? Will the app be able to handle it? Overnight Test Time!!! smile

well? is it working ?

@kevenwyld
Copy link

What happens if I run like 5 instances of preview:generate_all

So, I tried this just because I was looking for a faster way to test another issue I'm having with previews and it works!

I'm running (well, just trying it out now actually, no long term experience with it) OC\Preview\Imaginary to accelerate preview generation but as it turns out generate-all only sends one request to it at a time. I tried running one occ preview:generate-all <username> process for each user and it greatly increases the number of previews generated per-minute.

This gave me an idea, which I haven't tested yet. One could write a wrapper for generate-all which identifies directories within the nextcloud directory tree that contain images and queue them, running generate-all with the path argument with some amount of concurrency.

Unfortunately that would only help with initial generating of previews since the pre-generate command does not accept a path argument.

I don't really recommend any of that as a "solution" to multi-threading previews, as that should probably be supported in this software somehow instead, just sharing my findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants