Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[zstd][dict] Ensure that dictionary training functions are fully reen… #4078

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions lib/common/zstd_deps.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,13 @@
#ifndef ZSTD_DEPS_COMMON
#define ZSTD_DEPS_COMMON

/* qsort_r is an extension. */
#if defined(__linux) || defined(__linux__) || defined(linux) || defined(__gnu_linux__) || defined(__CYGWIN__) || defined(__MSYS__)
#if !defined(_GNU_SOURCE)
#define _GNU_SOURCE
#endif
#endif

#include <limits.h>
#include <stddef.h>
#include <string.h>
Expand Down
58 changes: 46 additions & 12 deletions lib/dictBuilder/cover.c
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,16 @@
/*-*************************************
* Dependencies
***************************************/
/* qsort_r is an extension. */
#if defined(__linux) || defined(__linux__) || defined(linux) || defined(__gnu_linux__) || defined(__CYGWIN__) || defined(__MSYS__)
#if !defined(_GNU_SOURCE)
#define _GNU_SOURCE
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only define it if not already defined.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#endif
#endif

#include <stdio.h> /* fprintf */
#include <stdlib.h> /* malloc, free, qsort */
#include <stdlib.h> /* malloc, free, qsort_r */

#include <string.h> /* memset */
#include <time.h> /* clock */

Expand Down Expand Up @@ -232,8 +240,10 @@ typedef struct {
unsigned d;
} COVER_ctx_t;

/* We need a global context for qsort... */
#if !defined(_GNU_SOURCE) && !defined(__APPLE__) && !defined(_MSC_VER)
/* C90 only offers qsort() that needs a global context. */
static COVER_ctx_t *g_coverCtx = NULL;
#endif

/*-*************************************
* Helper functions
Expand Down Expand Up @@ -276,11 +286,17 @@ static int COVER_cmp8(COVER_ctx_t *ctx, const void *lp, const void *rp) {

/**
* Same as COVER_cmp() except ties are broken by pointer value
* NOTE: g_coverCtx must be set to call this function. A global is required because
* qsort doesn't take an opaque pointer.
*/
#if defined(_WIN32) && defined(_MSC_VER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way COVER_strict_cmp() (and below COVER_strict_cmp8()) are implemented "feels" unusual : they share the same (short) body but feature different signatures depending on target system, controlled through preprocessor branching.

Given that the function signature is a mandated parameter of the qsort_r() function, and qsort_r() (of course!) varies depending on target system, I don't have a better solution to propose. Rather, I just forward some minor comments :
- WIN_CDECL is used even in non-Windows context. Granted, it's reduced to nothing in non-Windows environments, but seeing it listed for systems specifically tested as non Windows feels irrelevant
- _MSC_VER and __APPLE__ environments seem to share the same signature, so maybe they could be merged (?)

Ultimately, this is a comment focused on "readability". The functionality itself looks good.

static int WIN_CDECL COVER_strict_cmp(void* g_coverCtx, const void* lp, const void* rp) {
#elif defined(__APPLE__)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got test if treating *BSD as APPLE could do the trick...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't. Newer FreeBSD has the same syntax as GNU, but older has the same syntax as APPLE.

I will let go of FreeBSD for now and revisit the issue in a follow up patch.

static int WIN_CDECL COVER_strict_cmp(void *g_coverCtx, const void *lp, const void *rp) {
#elif defined(_GNU_SOURCE)
static int WIN_CDECL COVER_strict_cmp(const void *lp, const void *rp, void *g_coverCtx) {
#else /* C90 fallback.*/
static int WIN_CDECL COVER_strict_cmp(const void *lp, const void *rp) {
int result = COVER_cmp(g_coverCtx, lp, rp);
#endif
int result = COVER_cmp((COVER_ctx_t*)g_coverCtx, lp, rp);
if (result == 0) {
result = lp < rp ? -1 : 1;
}
Expand All @@ -289,8 +305,16 @@ static int WIN_CDECL COVER_strict_cmp(const void *lp, const void *rp) {
/**
* Faster version for d <= 8.
*/
#if defined(_WIN32) && defined(_MSC_VER)
static int WIN_CDECL COVER_strict_cmp8(void* g_coverCtx, const void* lp, const void* rp) {
#elif defined(__APPLE__)
static int WIN_CDECL COVER_strict_cmp8(void *g_coverCtx, const void *lp, const void *rp) {
#elif defined(_GNU_SOURCE)
static int WIN_CDECL COVER_strict_cmp8(const void *lp, const void *rp, void *g_coverCtx) {
#else /* C90 fallback.*/
static int WIN_CDECL COVER_strict_cmp8(const void *lp, const void *rp) {
int result = COVER_cmp8(g_coverCtx, lp, rp);
#endif
int result = COVER_cmp8((COVER_ctx_t*)g_coverCtx, lp, rp);
if (result == 0) {
result = lp < rp ? -1 : 1;
}
Expand Down Expand Up @@ -620,14 +644,24 @@ static size_t COVER_ctx_init(COVER_ctx_t *ctx, const void *samplesBuffer,
for (i = 0; i < ctx->suffixSize; ++i) {
ctx->suffix[i] = i;
}
/* qsort doesn't take an opaque pointer, so pass as a global.
* On OpenBSD qsort() is not guaranteed to be stable, their mergesort() is.
*/
g_coverCtx = ctx;
#if defined(__OpenBSD__)
#if defined(__APPLE__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qsort_r() is invoked directly from within in COVER_ctx_init(), resulting in some #if ... #elif ... #else ... preprocessor logic in the function's body, because qsort_r() is different depending on target system.
In a bid to lower preprocessor logic within COVER_ctx_init(), I was wondering if this design could be simplified by an invocation to a single function signature, and this function would be implemented differently depending on target system.
Such a design requires a common signature across all systems, so there might be some limits there.

qsort_r(ctx->suffix, ctx->suffixSize, sizeof(U32),
ctx,
(ctx->d <= 8 ? &COVER_strict_cmp8 : &COVER_strict_cmp));
#elif defined(_GNU_SOURCE)
qsort_r(ctx->suffix, ctx->suffixSize, sizeof(U32),
(ctx->d <= 8 ? &COVER_strict_cmp8 : &COVER_strict_cmp),
ctx);
#elif defined(_WIN32) && defined(_MSC_VER)
qsort_s(ctx->suffix, ctx->suffixSize, sizeof(U32),
(ctx->d <= 8 ? &COVER_strict_cmp8 : &COVER_strict_cmp),
ctx);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we will have to add some kind of fallback for C90.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The fallback for no is the same code as is being shipped by zstd.

In a follow up patch we can try to provide a reentrant qsort_r() for platforms that don't have it.

#elif defined(__OpenBSD__)
mergesort(ctx->suffix, ctx->suffixSize, sizeof(U32),
(ctx->d <= 8 ? &COVER_strict_cmp8 : &COVER_strict_cmp));
#else
#else /* C90 fallback.*/
g_coverCtx = ctx;
/* TODO(cavalcanti): implement a reentrant qsort() when is not available. */
qsort(ctx->suffix, ctx->suffixSize, sizeof(U32),
(ctx->d <= 8 ? &COVER_strict_cmp8 : &COVER_strict_cmp));
#endif
Expand Down