Parse JSON number to double in full-precision. #137

miloyip · 2014-09-06T03:02:57Z

Use kParseFullPrecision to turn on this option in compile-time. The implementation of new option should have no performance impact if the flag is not used.

Implementation details

The full-precision path tries to use fast-path if possible. If the criteria of fast-path cannot be met, it falls back to use strtod() to convert string.

Note that the parser still verify the JSON syntax of number as in normal-precision path.

To fulfill the above requirement, the parser needs to backup the correctly parsed characters from stream (as some streams cannot read back). A helper template class GenericReader::NumberStream is designed for this. If full-precision is set, then backup is required, and a specialized NumberStream will backup the characters during NumberStream::Take() into the GenericReader::stack_. That stack_ was previously used only for storing the decoded characters in ParseString().

Unit test

Added random numbers to test more cases for integer types and double.

This experimental results show that full precision generate exact representation (no error), while normal precision parsing has maximum error of 3 ULP.

Denormal numbers () are not tested as it varies among platforms. Implementations of strtod()` in standard libraries may also simply flush denormal to zero.

Fix #120

This shall generate best possible precision (if strtod() is correctly implemented). Need more unit tests and performance tests. May add an option for accepting precision error. Otherwise LUT in Pow10() can be reduced.

Kosta-Github · 2014-09-08T16:27:56Z

Feedback: I was also running into the precision issues lately (I need to ensure roundtrip behavior for double values).

Your PR seems to fix the precision issues but has a severe performance impact: reading of my test datasets (~2800 files, >500MB) went up from 3 seconds (default precision) to >13 seconds (full precision)... :-(

miloyip · 2014-09-09T01:16:50Z

May I know which platform/compiler?
I will try to see if it is possible to make a small, fast custom strtod().

Kosta-Github · 2014-09-09T04:40:26Z

Windows 7; VisualStudio 2012 latest service-pack; 64-bit mode

pah · 2014-09-09T08:16:52Z

I have confirmed the 3x performance loss on Linux (32-bit, Clang 3.6) by running the nativejson-benchmark against the issue120floatprecision branch. Fortunately, there is no significant difference in the JSON files not containing double values, but for canada.json, it's 3.6x slower.

How do other parsers handle the precision corner cases? Most parsers, I've seen so far, go for a simple digit-based approach and should suffer from the same problems, right?

miloyip · 2014-09-09T14:31:48Z

I suspect that canada.json has some issues when it is generated.

// ...
"geometry": {"type":"Polygon","coordinates":[[[-65.613616999999977,43.420273000000009],
// ...

For example, I think the first two numbers should actually output as -65.613617 and 43.420273. If so, the fast-path can handle them. (currently the number of digits > 15 in these numbers)

I suggest replacing canada.json by some JSONs with "more normal" numbers.

To @pah, I have not investigated the details of other parsers. Some of them seems using even less precise conversion (well.. actually previously RapidJSON does not use a proper fast path as well). Some of them are using strtod() or similar C library to do the conversion. I have thought of testing each library for conformance and precision. It just need nativejson-benchmark's current Parse() and Strinigfy() interface, and some JSON files. But I think I will do that in longer-term.

To @Kosta-Github, I am investigating strtod() in double-conversion and some information like this. Implement a custom strtod() can save some time for duplicated work, but I am not sure how much time can be saved. And it must increase the code size if doing so.

Kosta-Github · 2014-09-09T15:12:02Z

@miloyip have a look here about the number of required digits to enable roundtrip behavior: http://en.cppreference.com/w/cpp/types/numeric_limits/max_digits10. For double values it is 17...

…[ci skip]

miloyip · 2014-09-14T02:26:25Z

Intermediate Results

After working for a few days, I have implemented a custom strtod which can parse all doubles including subnormals (strtod in VC CRT does not work properly on subnormals) correctly in https://github.com/miloyip/rapidjson/tree/issue120floatprecision_customstrtod . Currently it only works on VC2013 x64. It has not been optimized.

The following results are generated by nativejson-benchmark.

Normal Precision

          Parse canada.json          ...  9.664 ms  222.132 MB/s
          Parse citm_catalog.json    ...  5.317 ms  318.868 MB/s
          Parse twitter.json         ...  3.030 ms  203.652 MB/s

Full Precision

          Parse canada.json          ... 90.939 ms  23.607 MB/s
          Parse citm_catalog.json    ...  5.522 ms  307.010 MB/s
          Parse twitter.json         ...  3.087 ms  199.886 MB/s

Full Precision, custom strtod

          Parse canada.json          ... 38.407 ms  55.895 MB/s
          Parse citm_catalog.json    ...  5.545 ms  305.737 MB/s
          Parse twitter.json         ...  3.032 ms  203.498 MB/s

There is performance improvement compared with the CRT's strtod. But it is still 4x to normal precision. I will firstly make the code compatible to gcc/clang and see if there is any optimization chances.

Should fix gcc debug error in tranvis. May need further refactoring.

…ion_customstrtod Conflicts: include/rapidjson/internal/dtoa.h test/unittest/readertest.cpp

…ion_customstrtod

miloyip · 2014-11-23T14:26:58Z

This is the latest result that I have been doing. Add another method (DiyFp) to try to parse the number. If it cannot handle the number correctly, it will fallback to BigInteger method. So basically in full precision mode, it will try FastPath -> DiyFp -> BigInteger. It should have better performance in average but adding more code size.

Normal precision
          Parse canada.json          ... 10.726 ms  200.145 MB/s
          Parse citm_catalog.json    ...  5.137 ms  330.021 MB/s
          Parse twitter.json         ...  3.013 ms  204.792 MB/s

Full Precision, custom strtod with DiyFp and BigInteger
          Parse canada.json          ... 23.561 ms  91.118 MB/s
          Parse citm_catalog.json    ...  5.626 ms  301.323 MB/s
          Parse twitter.json         ...  3.119 ms  197.848 MB/s

I hope to resolve this #120 "bug" and continue to work on a 1.0 RC.

pah · 2014-11-28T12:53:05Z

The approach in issue120floatprecision_customstrtod and its performance looks good to me. 👍

Can you update the branch issue120floatprecision in this pull-request to this improved implementation? I think, only the selection of the default (see this comment) is not done yet, right?

#120 (comment)

Parse JSON number to double in full-precision with custom strtod. Fix #120

miloyip added 11 commits August 28, 2014 23:03

Add test case

d8a51bf

Merge branch 'master' into issue120floatprecision

a46d152

Fallback strtod() when not able to do fast-path

0580d42

This shall generate best possible precision (if strtod() is correctly implemented). Need more unit tests and performance tests. May add an option for accepting precision error. Otherwise LUT in Pow10() can be reduced.

Add random tests for ParseNumber

818f6f1

Check "fast path cases in disguise" in strtod

b043691

Refactor ParseNumber for two modes (incomplete)

d875f16

Merge master and implement kParseFullPrecision

881c91d

Optimize ParseNumber()

a71f2e6

Fix ParseNumber_Integer test

c4a657d

Compute error statistics of normal precision

add0d9c

Update document for kParseFullPrecisionFlag

86d63ff

miloyip added 5 commits September 10, 2014 09:28

Fix VC2010 which don't have std::isnan() et al.

774a4aa

Prepare custom strtod data. (cannot pass unit test) [ci skip]

30ea2a3

Extract conversion code to strtod.h [ci skip]

359ebc7

Implementing custom strtod, fail on some cases [ci skip]

4bd240a

Make custom strtod work for denormal numbers and some boundary cases …

98dd0a0

…[ci skip]

miloyip added 7 commits September 14, 2014 10:52

Makes gcc x64 runnable, but failed on one case. [ci skip]

855da06

Add 32-bit support for custom strtod

4c21288

Fix a unit test warning and suppress a failing case

fa52a26

Fix normal-subnormal boundary and add more boundary cases in unit tests.

cbd7475

Remove unused BigInteger::operator+=(const BigInteger&)

bea4fa7

Limit significand to 17 digits for fast path

b29acfb

Should fix gcc debug error in tranvis. May need further refactoring.

Fix round towards even

50fc3fe

miloyip added 11 commits September 16, 2014 10:52

Trimming leading/trailing zeros and correct underflow case

a425ad5

Minor code cleaning

4f99e25

Extract classes into various files.

74b81fa

Added missing files

299e9f1

Minor optimizations in BigInteger

5171775

Minor refactoring before optimization trial

475b242

Partial StrtodDiyFp implementation [ci skip]

faa877f

Temp commit

b4e2d58

Fixes StrtodDiyFp bugs

40852f4

Fix gcc/clang compilation errors and turn off exhaustive number test

22ca931

Merge remote-tracking branch 'origin/master' into issue120floatprecis…

57b9130

…ion_customstrtod Conflicts: include/rapidjson/internal/dtoa.h test/unittest/readertest.cpp

pah mentioned this pull request Nov 17, 2014

Cannot parse min normal positive double #197

Closed

miloyip added 3 commits November 23, 2014 08:48

Merge remote-tracking branch 'origin/master' into issue120floatprecis…

3679c28

…ion_customstrtod

Minor optimization of strtod

b855c3f

Fix namespace compilation errors

0a17e1a

miloyip added 3 commits November 30, 2014 18:52

Add RAPIDJSON_PARSE_DEFAULT_FLAGS for customizing kParseDefaultFlags

23b7a5e

#120 (comment)

Merge remote-tracking branch 'origin/master' into issue120floatprecision

92554b5

Merge remote-tracking branch 'origin/master' into issue120floatprecision

e62d537

miloyip added a commit that referenced this pull request Nov 30, 2014

Merge pull request #137 from miloyip/issue120floatprecision

454146b

Parse JSON number to double in full-precision with custom strtod. Fix #120

miloyip merged commit 454146b into master Nov 30, 2014

miloyip deleted the issue120floatprecision branch December 8, 2014 02:34

m7thon mentioned this pull request Jul 14, 2015

Issue when serializing double USCiLab/cereal#202

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse JSON number to double in full-precision. #137

Parse JSON number to double in full-precision. #137

miloyip commented Sep 6, 2014

Kosta-Github commented Sep 8, 2014

miloyip commented Sep 9, 2014

Kosta-Github commented Sep 9, 2014

pah commented Sep 9, 2014

miloyip commented Sep 9, 2014

Kosta-Github commented Sep 9, 2014

miloyip commented Sep 14, 2014

miloyip commented Nov 23, 2014

pah commented Nov 28, 2014

Parse JSON number to double in full-precision. #137

Parse JSON number to double in full-precision. #137

Conversation

miloyip commented Sep 6, 2014

Implementation details

Unit test

Kosta-Github commented Sep 8, 2014

miloyip commented Sep 9, 2014

Kosta-Github commented Sep 9, 2014

pah commented Sep 9, 2014

miloyip commented Sep 9, 2014

Kosta-Github commented Sep 9, 2014

miloyip commented Sep 14, 2014

Intermediate Results

miloyip commented Nov 23, 2014

pah commented Nov 28, 2014