Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel execution fontforge in docker #1508

Merged
merged 4 commits into from
Feb 5, 2024
Merged

Parallel execution fontforge in docker #1508

merged 4 commits into from
Feb 5, 2024

Conversation

nobk
Copy link

@nobk nobk commented Feb 4, 2024

Description

Improve the docker execution method, optimize from single process to parallel execution, and increase the speed of font patching.
see issue 1507

Requirements / Checklist

What does this Pull Request (PR) do?

Speed up docker patcher

How should this be manually tested?

Any background context you can provide?

What are the relevant tickets (if any)?

Screenshots (if appropriate or helpful)

Copy link
Collaborator

@Finii Finii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice PR!

I did never put much effort in the docker patcher ;-) because I do not use it at all.
But than means that suggestions like yours are so valuable!

I think this is a very good improvement. The gotta-patch-em also can parallel-patch (but with a hand-rolled parallelism and a hard wired number of jobs).

I would allow people to specify the -j option, as not all want parallel patching.
Maybe also helpful would be some output if find turned up with nothing.

Tell me if you want to expand this or if you prefer me merging as-is.

@@ -23,6 +23,6 @@ done
printf "Running with options:\n%s\n" "$args"

# shellcheck disable=SC2086
for f in /in/*.otf /in/*.ttf /in/*.woff /in/*.eot /in/*.ttc; do [ -f "$f" ] && fontforge -script /nerd/font-patcher -out /out $args "$f"; done
find /in -type f \( -name "*.otf" -o -name "*.ttf" -o -name "*.woff" -o -name "*.eot" -o -name "*.ttc" \) | parallel fontforge -script /nerd/font-patcher -out /out $args {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like find much more than the loop. We could use -iname here, as some fonts out there have their name and extension in caps. (Which would be an additional commit of course ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I would use -iregex but that is because I'm a regex person and hate the -o style in find ;-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
find /in -type f \( -name "*.otf" -o -name "*.ttf" -o -name "*.woff" -o -name "*.eot" -o -name "*.ttc" \) | parallel fontforge -script /nerd/font-patcher -out /out $args {}
find /in -type f -iregex ".*\.\(otf|ttf|woff|eot|ttc\)" | parallel fontforge -script /nerd/font-patcher -out /out $args {}

untested

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dockerfile's find is a BusyBox multi-call, which is not support -iregex, I will change it with -iname.

@Finii
Copy link
Collaborator

Finii commented Feb 4, 2024

@allcontributors please add @nobk for code

Copy link
Contributor

@Finii

I've put up a pull request to add @nobk! 🎉

@Finii
Copy link
Collaborator

Finii commented Feb 4, 2024

As a side note, I did never check if the debug log will be even avaiable when docker is used.
Note to self: Check that and maybe add moving the logfile to /out.

@nobk
Copy link
Author

nobk commented Feb 4, 2024

I would allow people to specify the -j option, as not all want parallel patching. Maybe also helpful would be some output if find turned up with nothing.

Tell me if you want to expand this or if you prefer me merging as-is.

I will add an option for -j PN, with docker option -e "PN=1", to disable parallel execute.

docker run --rm -v /path/to/fonts:/in:Z -v /path/for/output:/out:Z -e "PN=1" nerdfonts/patcher [OPTIONS]

@nobk
Copy link
Author

nobk commented Feb 4, 2024

I think docker patcher is for users, not for developers, so I am not test logfile output.
When I set -e "PN=20" , all hyper threads of i7-12700K CPU is used, up to 4.9GB RAM used, speed up max.

@Finii
Copy link
Collaborator

Finii commented Feb 5, 2024

Thank you! Appreciate your work 💚

@Finii Finii merged commit e633e5f into ryanoasis:master Feb 5, 2024
1 check passed
@Finii
Copy link
Collaborator

Finii commented Feb 5, 2024

image

Fixing

@Finii
Copy link
Collaborator

Finii commented Feb 5, 2024

$ shellcheck docker-entrypoint.sh

In docker-entrypoint.sh line 28:
	  -exec fontforge -script /nerd/font-patcher -out /out $args {} \;
                                                               ^---^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
	  -exec fontforge -script /nerd/font-patcher -out /out "$args" {} \;


In docker-entrypoint.sh line 34:
	  | parallel $njob fontforge -script /nerd/font-patcher -out /out $args {}
                     ^---^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                                                          ^---^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
	  | parallel "$njob" fontforge -script /nerd/font-patcher -out /out "$args" {}

@nobk
Copy link
Author

nobk commented Feb 5, 2024

no, that warnning caused by BusyBox multi-call sh. If the environment variable PN is not defined, this warning will appear.

docker run -it --rm alpine:latest sh
/ # [ "$PN" -eq 1 ]
sh: out of range
/ # which sh
/bin/sh
/ # /bin/sh --version
/ # ls -l /bin/sh
lrwxrwxrwx    1 root     root            12 Jan 26 17:53 /bin/sh -> /bin/busybox
/ #

@nobk
Copy link
Author

nobk commented Feb 5, 2024

Adding one more line of code eliminates those two warnings.
[[ -z "$PN" ]] && PN=0
need me submit a PR for this?

@Finii
Copy link
Collaborator

Finii commented Feb 5, 2024

no, thanks :-)

@nobk
Copy link
Author

nobk commented Feb 5, 2024

My fix is like this, and not submit that then, it just warning.

git diff
diff --git a/bin/scripts/docker-entrypoint.sh b/bin/scripts/docker-entrypoint.sh
index acd931a..bb852d3 100644
--- a/bin/scripts/docker-entrypoint.sh
+++ b/bin/scripts/docker-entrypoint.sh
@@ -23,6 +23,7 @@ done
 printf "Running with options:\n%s\n" "$args"

 # shellcheck disable=SC2086
+[[ -z "$PN" ]] && PN=0
 if [ "$PN" -eq 1 ]; then
        find /in -type f \

@nobk
Copy link
Author

nobk commented Feb 5, 2024

You can execute sudo docker system prune and answer y to reclaim the disk space occupied by the outdated docker patcher <none> images.

@Finii
Copy link
Collaborator

Finii commented Feb 5, 2024

I added two more commits, hope you find them ok.

1fccd8a docker: Allow blancs in filenames
7ebbc4e docker: Include logfile in output

That also fixes the shellcheck warning and the runtime warning.

This is a very good addition! Thanks again for the idea and implementation/PR!

@Finii
Copy link
Collaborator

Finii commented Feb 5, 2024

I think docker patcher is for users, not for developers, so I am not test logfile output.

Well, that output helps (me) when those users have problems with some particular font.
They can report with the debug messages and I can possible see what goes wrong.
This is important for fonts that are not free, i.e. I can not download and test myself.

For example

  • "After patching the gap is too wide - I can not share the font file"
  • "Please run with --debug 1"
  • In the debug log: Gap detected ...

@nobk
Copy link
Author

nobk commented Feb 5, 2024

As you changed the default value of PN to 1, you need modify readme.md docker usage section, default single process, -e "PN=0" let parallel decide the number of tasks.
But I fell default value of PN as 0 will be better because all docker user can benefit by speed up from this upgrade.
Unless you need debugging or there is insufficient memory, you need to set PN to 1.

@Finii
Copy link
Collaborator

Finii commented Feb 5, 2024

I didnt intend to change your default, though 😬
Rewrote that several times, maybe got confused ;-)
Fixing.

Thanks for reporting. I believe -j0 is the better default.

@Finii
Copy link
Collaborator

Finii commented Feb 5, 2024

96497b4 docker: Run parallel by default

@nobk
Copy link
Author

nobk commented Feb 5, 2024

Thanks for reporting. I believe -j0 is the better default.

I am not using -j0 in my PR, I just omitted the -j parameter in shell script by default or PN=0, parallel generate 8 tasks for me, I can see them in htop.
Your latest commit with parallel -j0 I just tested, parallel generate 32 tasks for me, because my custom fonts have totally 32 .ttf files, it use 7.8GB memory.
If users have more font files, I think -j0 will cause out of memory or too much tasks let system lag.

@nobk
Copy link
Author

nobk commented Feb 5, 2024

omitted the -j parameter means -j 100%
man parallel

       --jobs num
       -j num
       --max-procs num
       -P num
           Number of jobslots on each machine.

           Run up to num jobs in parallel. Default is 100%.

           num    Run up to num jobs in parallel.

           0      Run as many as possible (this can take a while to determine).

                  Due to a bug -j 0 will also evaluate replacement strings twice up  to  the  number  of
                  joblots:

                    # This will not count from 1 but from number-of-jobslots
                    seq 10000 | parallel -j0   echo '{= $_ = $foo++; =}' | head
                    # This will count from 1
                    seq 10000 | parallel -j100 echo '{= $_ = $foo++; =}' | head

           num%   Multiply  the  number  of  CPU threads by num percent. E.g. 100% means one job per CPU
                  thread on each machine.

           +num   Add num to the number of CPU threads.

           -num   Subtract num from the number of CPU threads.

           expr   Evaluate expr. E.g. '12/2' to get 6, '+25%' gives  the  same  as  '125%',  or  complex
                  expressions  like  '+3*log(55)%'  which means: multiply 3 by log(55), multiply that by
                  the number of CPU threads and divide by 100, add this to the number of CPU threads.

                  An expression that evalutates to less that 1 is replaced with 1.

           procfile
                  Read parameter from file.

                  Use the content of procfile as parameter for  -j.  E.g.  procfile  could  contain  the
                  string 100% or +2 or 10.

                  If procfile is changed when a job completes, procfile is read again and the new number
                  of  jobs is computed. If the number is lower than before, running jobs will be allowed
                  to finish but new jobs will not be started until the wanted number of  jobs  has  been
                  reached.   This  makes  it  possible to change the number of simultaneous running jobs
                  while GNU parallel is running.

           If the evaluated number is less than 1 then 1 will be used.

           If --semaphore is set, the default is 1 thus making a mutex.

           See also: --use-cores-instead-of-threads --use-sockets-instead-of-threads

@Finii
Copy link
Collaborator

Finii commented Feb 5, 2024

omitted the -j parameter means -j 100% man parallel

But that would require an (additional) if and/or code duplication.

In principle we could

-parallel --verbose --null "--jobs=${PN}" fontforge -script ../../font-patcher -out ../../out_fonts $args {}
+[-n "$PN"] && jobs=-j${PN}
+parallel --verbose --null "${jobs}" fontforge -script ../../font-patcher -out ../../out_fonts $args {}

The problem is that we can not use double qoutes around jobs then, because that passes an empty option (instead of no option, which would be passed without quotes).

image

Your original PR had the if and code duplication (find twice and font-patcher twice), which is not good in the long run.

Not having the params double quoted raisesshellcheck warnings and should be avoided as much as possible (imho).

That's the reason I used an explicit 100% (i.e. -j0) to get it - branchless programming ;-)

@nobk
Copy link
Author

nobk commented Feb 5, 2024

KISS = keep it simple stupid
I do not think use if else and write find twice not good.
Make it work is important than make it looks well.
If you need debug, find -exec is better than find | parallel -j1 .

@Finii
Copy link
Collaborator

Finii commented Feb 5, 2024

Debugging is not about debugging the code but debugging the font.

Having code that works (supposedly) but is hard to read does not help anyone in the long run. So looking well IS important.

Sorry that you do not like the changes.

@nobk
Copy link
Author

nobk commented Feb 5, 2024

The -j0 make max task as many as total font files can not as default, need avoid.

@nobk
Copy link
Author

nobk commented Feb 5, 2024

If you want keep the double qoutes, you can write like this

jobs="-j 100%"
[[ -n "$PN" ]] && [ "$PN" -gt 0 ] && jobs="-j${PN}"

@nobk
Copy link
Author

nobk commented Feb 5, 2024

Sorry that you do not like the changes.

I like correct code refactoring, not the introduction of bugs.

Repository owner locked as too heated and limited conversation to collaborators Feb 5, 2024
@Finii
Copy link
Collaborator

Finii commented Feb 5, 2024

image

That still bugs with an obscure error if someone does -e "PN=test". That is not an improvement but making the error feedback worse.

I like ... not the introduction of bugs

But still you suggest the above code. That will end up as

parallel --verbose "-j 0" ...

and that will raise an error because the allowed syntax is only "-j0" or "-j=0" (for single arguments, and keeping the quotes exactly means that: 1 argument) (and the two argument version "-j" "0".)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants