This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
[v1.7.x] Backport #17177 to 1.7.x (Fix incorrect calculation results when the C locale is set to a locale that uses commas as the decimal separator) #18147
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Backport #17177 to the branch for v1.7.x.
Currently, many operators utilize
std::stod
to convert strings into floating point numbers. This causes incorrect calculations (#17140, #16134) when the C locale is set to a locale which uses commas (,
) as decimal separators.This pull request replaces calls to
std::stod
andstd::stof
to the locale-invariantdmlc::stod
anddmlc::stof
respectively.The scope of this patch
This patch should fix a large portion of interactions through Python and JVM frontends, since they use locale-invariant serialization in order to pass parameters into MXNet's C API.
However, frontends which utilize C locale-aware serialization (i.e. call
sprintf
or similar) may break when using locales which don't use.
as the decimal separator. They will have to be fixed in a separate patch. I also suspect that they are already broken, because operators utilising dmlc-core parameter parsing were already using locale-invariant serialization.Further steps
STL streams are heavily used within the codebase, both for serialization and for forming user-friendly messages. Fortunately, they don't seem to be affected by the C standard library locale settings. However, if someone sets the STL locale by calling std::locale::global, MXNet's API will be broken. In order to ensure that this doesn't happen, all streams which are used for serialization will have to be imbued with the "C" locale (not the locale set in the C standard library).
It would be nice to see a more principled approach to serialization in MXNet 2.0, e.g. using a binary format for communication between the frontend and the backend. In addition to solving locale-related issues, this would probably result in a smaller invocation overhead.
Locale-invariant serialization vs locale-aware serialization
As a side-note, using locale-aware serialization is not an option, simply because using
,
as the decimal separator adds ambiguities to tuple serialization, e.g.(4,4,3)
can be a tuple of 3 integers, or a tuple of 2 floats.Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
std::stod
withdmlc::stod
andstd::stof with
dmlc::stof`.Comments
,
as the decimal separator. However, since there was no consistency between the various operators before, it is not unlikely that the code was already broken.