Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[zstd][dict] Ensure that dictionary training functions are fully reen… #4078

Closed
wants to merge 5 commits into from

Conversation

Adenilson
Copy link

…trant

The two main functions used for dictionary training using the COVER algorithm require initialization of a COVER_ctx_t where a call to qsort() is performed.

The issue is that the standard C99 qsort() function doesn't offer a way to pass an extra parameter for the comparison function callback (e.g. a pointer to a context) and currently zstd relies on a global static variable to hold a pointer to a context needed to perform the sort operation.

If a zstd library user invokes either ZDICT_trainFromBuffer_cover or ZDICT_optimizeTrainFromBuffer_cover from multiple threads, the global context may be overwritten before/during the call/execution to qsort() in the initialization of the COVER_ctx_t, thus yielding to crashes and other bad things (Tm) as reported on issue #4045.

Enters qsort_r(): it was designed to address precisely this situation, to quote from the documention [1]: "the comparison function does not need to use global variables to pass through arbitrary arguments, and is therefore reentrant and safe to use in threads."

It is available with small variations for multiple OSes (GNU, BSD[2], Windows[3]), and the ISO C11 [4] standard features on annex B-21 qsort_s() as part of the <stdlib.h>. Let's hope that compilers eventually catch up with it.

For now, we have to handle the small variations in function parameters for each platform.

The current fix solves the problem by allowing each executing thread pass its own COVER_ctx_t instance to qsort_r(), removing the use of a global pointer and allowing the code to be reentrant.

[1] https://man7.org/linux/man-pages/man3/qsort_r.3.html
[2] https://man.freebsd.org/cgi/man.cgi?query=qsort_r
[3] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170
[4] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf

…trant

The two main functions used for dictionary training using the COVER
algorithm require initialization of a COVER_ctx_t where a call
to qsort() is performed.

The issue is that the standard C99 qsort() function doesn't offer
a way to pass an extra parameter for the comparison function callback
(e.g. a pointer to a context) and currently zstd relies on a *global*
static variable to hold a pointer to a context needed to perform
the sort operation.

If a zstd library user invokes either ZDICT_trainFromBuffer_cover or
ZDICT_optimizeTrainFromBuffer_cover from multiple threads, the
global context may be overwritten before/during the call/execution to qsort()
in the initialization of the COVER_ctx_t, thus yielding to crashes
and other bad things (Tm) as reported on issue facebook#4045.

Enters qsort_r(): it was designed to address precisely this situation,
to quote from the documention [1]: "the comparison function does not need to
use global variables to pass through arbitrary arguments, and is therefore
reentrant and safe to use in threads."

It is available with small variations for multiple OSes (GNU, BSD[2],
Windows[3]), and the ISO C11 [4] standard features on annex B-21 qsort_s() as
part of the <stdlib.h>. Let's hope that compilers eventually catch up
with it.

For now, we have to handle the small variations in function parameters
for each platform.

The current fix solves the problem by allowing each executing thread
pass its own COVER_ctx_t instance to qsort_r(), removing the use of
a global pointer and allowing the code to be reentrant.

[1] https://man7.org/linux/man-pages/man3/qsort_r.3.html
[2] https://man.freebsd.org/cgi/man.cgi?query=qsort_r
[3] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170
[4] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
@dearblue
Copy link
Contributor

I was curious, so I did a little research, focusing on FreeBSD.

The version of FreeBSD that adds support for the POSIX standard qsort_r() is 14.0, released on 2023-11-20. It is described as using "C11 Generic selection" to maintain past compatibility.
freebsd/freebsd-src@af3c788
😿 I could not find where qsort_r() is defined by the POSIX standard.

Up to the FreeBSD 13 series qsort_r() is only available in the first specification introduced.
https://man.freebsd.org/cgi/man.cgi?query=qsort_r&manpath=FreeBSD+13.3-RELEASE
The End-of-life for the FreeBSD 13 series is 2026-04-30.
https://www.freebsd.org/security/#sup

Notably, FreeBSD was the first to introduce qsort_r(), followed by GNU with an incompatible implementation of the same name with different arguments.
Apple brought qsort_r() from FreeBSD.
https://stackoverflow.com/a/39561369

Also, I have not actually checked, but it is possible that NetBSD and OpenBSD do not have both qsort_r() and qsort_s().
At least not in the latest man page, and I could not find definitions in src/include/stdlib.h.
https://man.netbsd.org/qsort_r https://man.netbsd.org/qsort_s
https://man.openbsd.org/qsort_r https://man.openbsd.org/qsort_s

illumos seems to have a GNU-compatible qsort_r(), but does not seem to have qsort_s().
illumos/illumos-gate@44431c8

To make it portable without confusion and usable even in C99 (C89?), I think one option would be to implement it independently in zstd.

@adenilsoncavalcanti
Copy link

@dearblue thanks a lot for the comments, I really appreciate it.

I've validated the current patch with a VM running FreeBSD 14.1, since I believe new software (e.g. upcoming zstd) should be validated in new OSes first.

I incorrectly assumed that all *BSD versions shared the same libc (note: I'm not a *BSD user or developer), but what you are reporting is that NetBSD + OpenBSD can have a diverging libc when compared to FreeBSD.

While a a qsort() function can be easily copied & pasted to the current patch, validating it is a bit more tricky: I'm aware that some OS vendors (e.g. Apple) do ship heavily optimized system libraries in their products (e.g. iOS, OSX, etc).

A replacement qsort() function would have to demonstrate similar or superior performance than what zstd currently uses (i.e. the default system libc on each OS/compiler) to avoid performance regressions.

That validation effort would further delay landing this fix to a real SEGFAULT in zstd thanks to what appears to be a *BSD idiosincrasy.

On top of that, I couldn't detect qsort() is a hotspot in a regular 'perf report'.

If it was a real hotspot in zstd it would be worthwhile the effort in optimizing it further (e.g. using SIMD techniques exploiting AVX-512 as deployed on https://github.com/intel/x86-simd-sort).

The current patch works fine on Linux, OSX and Windows (probably Android, I will test it soon), it seems that *BSD is the one giving us trouble at the moment.

Question: since you mentioned that 'Apple brought qsort_r() from FreeBSD'. If I treat *BSD as APPLE would that work on the current stable versions of *BSD?

@Cyan4973
Copy link
Contributor

Cyan4973 commented Jun 21, 2024

Note that strict C90 compatibility is a must have for libzstd.

While it's allowed to use posix-specific symbols, it also means that some backup strategy must exist and be triggered automatically for other C90 platforms without these facilities. We don't even need exact feature parity, it just needs to compile and run properly.

Aside VStudio compiler, a valid workflow is to build zstd with
either gcc or clang on Windows.

This minor change ensure that the dictionary fix will build fine
on Windows with both gcc 13.3.0 and clang-11 on MSYS environment.
@Adenilson
Copy link
Author

@Cyan4973 to clarify, you suggest that if we can't get something working on all versions of *BSD, we should provide some kind of 'fallback' (e.g. a version of qsort_r()) to allow it to build cleanly. Is that correct?

int result = COVER_cmp(g_coverCtx, lp, rp);
#if defined(_WIN32) && defined(_MSC_VER)
static int WIN_CDECL COVER_strict_cmp(void* g_coverCtx, const void* lp, const void* rp) {
#elif defined(__APPLE__)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got test if treating *BSD as APPLE could do the trick...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't. Newer FreeBSD has the same syntax as GNU, but older has the same syntax as APPLE.

I will let go of FreeBSD for now and revisit the issue in a follow up patch.

static int WIN_CDECL COVER_strict_cmp(void* g_coverCtx, const void* lp, const void* rp) {
#elif defined(__APPLE__)
static int WIN_CDECL COVER_strict_cmp(void *g_coverCtx, const void *lp, const void *rp) {
#elif defined(_GNU_SOURCE) || defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently this won't work on Net/OpenBSDs and older FreeBSD.

#elif defined(_WIN32) && defined(_MSC_VER)
qsort_s(ctx->suffix, ctx->suffixSize, sizeof(U32),
(ctx->d <= 8 ? &COVER_strict_cmp8 : &COVER_strict_cmp),
ctx);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we will have to add some kind of fallback for C90.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The fallback for no is the same code as is being shipped by zstd.

In a follow up patch we can try to provide a reentrant qsort_r() for platforms that don't have it.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Jun 21, 2024

@Cyan4973 to clarify, you suggest that if we can't get something working on all versions of *BSD, we should provide some kind of 'fallback' (e.g. a version of qsort_r()) to allow it to build cleanly. Is that correct?

Yes,
even losing the re-entrant property would be acceptable,
as long as it can run fine on a barebone C90-compliant system.

@@ -21,8 +21,14 @@
/*-*************************************
* Dependencies
***************************************/
/* qsort_r is an extension. */
#if defined(__linux) || defined(__linux__) || defined(linux) || defined(__gnu_linux__) || defined(__CYGWIN__) || defined(__MSYS__)
#define _GNU_SOURCE
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only define it if not already defined.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dearblue
Copy link
Contributor

dearblue commented Jun 22, 2024

Question: since you mentioned that 'Apple brought qsort_r() from FreeBSD'. If I treat *BSD as APPLE would that work on the current stable versions of *BSD?

No. I actually checked, and qsort_r() and qsort_s() are indeed not present, even on the latest NetBSD and OpenBSD.

Adenilson Cavalcanti added 3 commits June 23, 2024 23:52
This should enable building for C90 compilers.
There is a bot complaining about it, let's see if it will
make it happy.
The script 'combine.sh' joins the whole zstd source code in a single file to
allow easier integration of the zstd library in projects.

The issue arrives that _GNU_SOURCE has to be defined before the inclusion
of any standard header to activate usage of qsort_r().

This change will allow 'make all' to pass as it fixes the build for using
the 'build_library_test.sh' script.
@Adenilson
Copy link
Author

@Cyan4973 apparently all bots in the CI are green with the latest version of the patchset.

I got two questions:
a) Do you wish to change anything on it?

b) Should I squash all the 5 commits in a single one?

Copy link
Contributor

@Cyan4973 Cyan4973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Adenilson !
This is great work, and I think it works as intended.

I've focused my comments on source code maintainability, given that the functionality itself is already in a good place.

*/
#if defined(_WIN32) && defined(_MSC_VER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way COVER_strict_cmp() (and below COVER_strict_cmp8()) are implemented "feels" unusual : they share the same (short) body but feature different signatures depending on target system, controlled through preprocessor branching.

Given that the function signature is a mandated parameter of the qsort_r() function, and qsort_r() (of course!) varies depending on target system, I don't have a better solution to propose. Rather, I just forward some minor comments :
- WIN_CDECL is used even in non-Windows context. Granted, it's reduced to nothing in non-Windows environments, but seeing it listed for systems specifically tested as non Windows feels irrelevant
- _MSC_VER and __APPLE__ environments seem to share the same signature, so maybe they could be merged (?)

Ultimately, this is a comment focused on "readability". The functionality itself looks good.

*/
g_coverCtx = ctx;
#if defined(__OpenBSD__)
#if defined(__APPLE__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qsort_r() is invoked directly from within in COVER_ctx_init(), resulting in some #if ... #elif ... #else ... preprocessor logic in the function's body, because qsort_r() is different depending on target system.
In a bid to lower preprocessor logic within COVER_ctx_init(), I was wondering if this design could be simplified by an invocation to a single function signature, and this function would be implemented differently depending on target system.
Such a design requires a common signature across all systems, so there might be some limits there.

@Cyan4973
Copy link
Contributor

Should I squash all the 5 commits in a single one?

Yes,
when some intermediate steps tend to break CI, it's better to remove them from the commit history, otherwise it will make future debugging bisect operations more problematic to manage.

@Adenilson Adenilson closed this Jul 1, 2024
@Adenilson Adenilson deleted the fix01 branch July 1, 2024 19:39
@Adenilson
Copy link
Author

Github was refusing to accept a push of my local + rebased changes to remote, so I was forced to delete the remote branch to be able to push and as a result the merge request was purged.

The new one is at:
#4086

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants