Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow bulk conversion to safe PDFs #77

Closed
micahflee opened this issue Apr 23, 2020 · 10 comments · Fixed by #247
Closed

Allow bulk conversion to safe PDFs #77

micahflee opened this issue Apr 23, 2020 · 10 comments · Fixed by #247
Labels
enhancement New feature or request
Milestone

Comments

@micahflee
Copy link
Contributor

A few different people have requested this feature: the ability to convert documents to safe PDFs in bulk.

This shouldn't be too difficult to implement, but I think the biggest issue is how the user interface should work. One idea is to allow the user to browse for a folder, and then try to automatically convert all documents it can in that folder to save versions (appending -safe.pdf to the end). It probably makes sense to convert them one at a time, and also display a report of all unsupported files in the folder that couldn't get converted.

What happens if there's a file called test.pdf and another called test-safe.pdf in the same folder though? What would the safe version of test.pdf get named? It could maybe detect this edge case, and tell you that you must not have any files that end in -safe.pdf in the folder to convert.

@micahflee micahflee added the enhancement New feature or request label Apr 23, 2020
@micahflee micahflee added this to the 0.2 milestone Jun 3, 2020
@haplo
Copy link
Contributor

haplo commented Jul 22, 2020

It probably makes sense to convert them one at a time, and also display a report of all unsupported files in the folder that couldn't get converted.

Running multiple conversions in parallel would improve running times and make use of multiple CPU cores. Just run a process pool bound to the number of CPUs (ideally would be a configurable setting), queue all PDF to be converted (using multiprocessing.Queue and have the worker processes pull from the queue and do the conversion. I can provide an implementation for this.

What happens if there's a file called test.pdf and another called test-safe.pdf in the same folder though? What would the safe version of test.pdf get named? It could maybe detect this edge case, and tell you that you must not have any files that end in -safe.pdf in the folder to convert.

I think a good UX would be to identify all PDF files that already have a -safe version and skip them, telling the user that they seem to be already converted. User can then manually rename them if they want Dangerzone to work on them.

@pettitjr
Copy link

I would also like to see this feature!

@JesseKrembsNYT
Copy link

What James said..

@tzmnyt
Copy link

tzmnyt commented Oct 22, 2020

Bulk conversion of PDFs is very much needed.

@RLburrito
Copy link

Bulk conversion is desperately needed here as well.

@DirtyNoob
Copy link

I agree to the consensus here, this is much needed!

@anarcat
Copy link

anarcat commented Jun 10, 2021

as part of #110, i implemented a webdav processor to do batch processing. the idea is that you dump your files in a webdav folder (e.g. on nextcloud), share that folder with the dangerzone bot, which pulls the files, processes them in docker, and pushes the sanitized files back.

see https://gitlab.torproject.org/tpo/tpa/dangerzone-webdav-processor/ for details.

@ninavizz
Copy link
Member

Hey hey! I've got some hours to spare to get a nice UX together for this, and to potentially improve the single-file experience. Can do & share next week.

@micahflee micahflee modified the milestones: 0.2, 0.3 Jun 15, 2021
@micahflee micahflee removed this from the 0.3 milestone Dec 8, 2021
@deeplow
Copy link
Contributor

deeplow commented Aug 2, 2022

An extra thing to consider is that a user may want to add extra files while some are already processing.

@eloquence eloquence added this to the 0.4.0 milestone Sep 15, 2022
@eloquence
Copy link
Member

Tentatively adding to 0.4.0 milestone; initially, this may only include backend changes to support bulk conversion + CLI support.

deeplow added a commit that referenced this issue Nov 3, 2022
deeplow added a commit that referenced this issue Nov 9, 2022
deeplow added a commit that referenced this issue Nov 10, 2022
deeplow added a commit that referenced this issue Nov 10, 2022
deeplow added a commit that referenced this issue Nov 14, 2022
deeplow added a commit that referenced this issue Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.