Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use faster Python script to amalgamate #3005

Merged
merged 10 commits into from
Jan 22, 2022

Conversation

cwoffenden
Copy link
Contributor

As mentioned in #2924 the sh amalgamation script is extremely, unusably slow. This faster Python version completes almost instantly. TL;DR where it used to take over 6.5 minutes for the full library it now takes a 1/3 of a second (on this old, old Mac Pro). Testing on an M1 Mac it takes 0.095 seconds to run the same.

When developing the original single file decoder a Bash script was used, which completed in reasonable time, but didn't work with Zstd's CI on older OS/machine combos. This Python version was the working replacement but never supported all of the command line flags (I'll add them in the next days if there's interest, before calling this done).

Some numbers before:

carl@Betty single_file_libs % time ./create_single_file_decoder.sh
./create_single_file_decoder.sh  115.85s user 138.85s system 150% cpu 2:49.02 total
carl@Betty single_file_libs % time ./create_single_file_library.sh
./create_single_file_library.sh  279.35s user 322.86s system 154% cpu 6:29.05 total

And after:

carl@Betty single_file_libs % time ./create_single_file_decoder.sh
./create_single_file_decoder.sh  0.12s user 0.04s system 95% cpu 0.174 total
carl@Betty single_file_libs % time ./create_single_file_library.sh
./create_single_file_library.sh  0.20s user 0.06s system 97% cpu 0.270 total

The creation scripts first test that Python 3.8 or higher is installed, and if not the original shell script is run instead.

If there's interest in taking this I'll spend a few hours to make sure all the flags are supported and tested. As it stands it doesn't support excluding, which isn't used by Zstd anyway (it is in other projects).

@Cyan4973
Copy link
Contributor

Thanks @cwoffenden 👍 ,
I'm sure there will be quite some interest in this faster version 😉

@cwoffenden cwoffenden marked this pull request as draft January 18, 2022 10:36
Made the Python more Python-like. Added notes and general tidy. Tested exclusions and building with various options. Tested all scripts.
@cwoffenden
Copy link
Contributor Author

cwoffenden commented Jan 19, 2022

This is done and tidied. I limited running the Python version to 3.8 (and higher) though it may work on earlier versions (testing was only done on 3.8). The Python is as Python-like as I know how (I'm not a Python guy), has the type annotations and has been passed through mypy static type checking.

I changed the amalgamation scripts to use the -x flag to exclude zstd_legacy.h since both the decoder and full libraries build without ZSTD_LEGACY_SUPPORT, and this lets us remove 50kB from the resulting generated C (and if the macro is ever enabled an #error directive is inserted to show why it fails). I also tested when using the -k flag to not inline zstd.h to keep the library as separate header and implementation, and that also works well.

And it's fast, even on my 2013 Mac Pro:

carl@Betty single_file_libs % time ./combine.py -r ../../lib -x legacy/zstd_legacy.h -o zstddeclib.c zstddeclib-in.c
./combine.py -r ../../lib -x legacy/zstd_legacy.h -o zstddeclib.c   0.11s user 0.03s system 96% cpu 0.145 total

Consistently less than 0.2s vs nearly 3m previously (for the decoder lib), so over a thousand times faster.

@cwoffenden cwoffenden marked this pull request as ready for review January 19, 2022 10:47
@cwoffenden
Copy link
Contributor Author

Added bonus: the Python script also works on Windows. In theory the shell script did, with the Git shell support, for example, but would take hours. It takes longer than on macOS on the same machine, completing in 0.6s:

Measure-Command {python3 combine.py -r ../../lib -x legacy/zstd_legacy.h -o zstddeclib.c zstddeclib-in.c}
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 594

When testing amalgamating other projects it was found: invalid Unicode errors were tripping Python's text IO, and the header search order appears differs from the shell version.
@cwoffenden
Copy link
Contributor Author

This has had further testing and edge cases fixed. I can now call it done.

build/single_file_libs/combine.py Outdated Show resolved Hide resolved
build/single_file_libs/combine.py Show resolved Hide resolved
build/single_file_libs/combine.py Outdated Show resolved Hide resolved
cwoffenden and others added 3 commits January 19, 2022 17:33
Co-authored-by: Alexandre Bury <alexandre.bury@gmail.com>
@Cyan4973
Copy link
Contributor

minor question :

It seems the new recipe needs to explicitly exclude zstd_legacy.h,
while that wasn't required before.
Any explanation why ?

@cwoffenden
Copy link
Contributor Author

The input file always stripped out certain features, primarily to reduce the size when this was just the decompressor alone, e.g.:

#define ZSTD_LEGACY_SUPPORT 0
#define ZSTD_STRIP_ERROR_STRINGS
#define ZSTD_TRACE 0

In the original version of tool there wasn't the -x option, so all files were inlined, used or not.

I can add it back in, if you'd prefer? It removed 50k-ish from the generated source, which the compiler skips over because those parts are currently if'd out.

@Cyan4973
Copy link
Contributor

I think it's a pretty good reason.
I believe it just needs to be documented, as otherwise people will just not understand why it's there, and may miss this optimization in some future update.

@cwoffenden
Copy link
Contributor Author

cwoffenden commented Jan 20, 2022

I made the #error inserted into the generated code more detailed, and also added docs for the ZSTD_LEGACY_SUPPORT flag. I tidied the instructions and copyright year whilst I was in there.

@Cyan4973
Copy link
Contributor

Thanks @cwoffenden !

@Cyan4973 Cyan4973 merged commit b27356f into facebook:dev Jan 22, 2022
@cwoffenden cwoffenden deleted the faster-amalgamate branch January 22, 2022 06:21
@Cyan4973 Cyan4973 mentioned this pull request Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants