Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float point reading is lossy. #120

Closed
xpol opened this issue Aug 28, 2014 · 9 comments · Fixed by #137
Closed

Float point reading is lossy. #120

xpol opened this issue Aug 28, 2014 · 9 comments · Fixed by #137
Assignees
Labels
Milestone

Comments

@xpol
Copy link

xpol commented Aug 28, 2014

The float point value 0.9868011474609375 become 0.9868011474609376.

If it is a performance tradeoff, I think we should provide an option to user that prefers correctness over speed.

@miloyip miloyip added this to the v1.0 Beta milestone Sep 1, 2014
@miloyip miloyip added the bug label Sep 1, 2014
@miloyip miloyip self-assigned this Sep 1, 2014
@miloyip
Copy link
Collaborator

miloyip commented Sep 3, 2014

I recently tried an approach in this branch.

  1. It backup the valid characters into the stack (which was only used by ParseString() previously).
  2. If the value can be converted via fast-path for s * 10^p, 0 <= s <= 2^53 - 1 && -22 <= p <= 22, where s, p are integer.
  3. Otherwise, call strtod() with the backup string in stack.

More tests have been added and it can parse the numbers exactly (so with google test EXPECT_DOUBLE_EQ() can be changed to EXPECT_EQ()).

However, the downside is that, the backup process incur unnecessary overheads even for parsing integers. And also calling C++ strtod() may be slow when fast-path's criteria cannot be met.

It is now getting two questions:

  1. Whether to support the previous inexact solution which only uses fast path.
  2. If both solutions are supported, which should be default.

Besides, another issue is that, with the exact solution, the lookup table in Pow10() can reduce the range from [0..308] to [0..22].

@pah
Copy link
Contributor

pah commented Sep 4, 2014

My 3¢:

  • I would prefer to have both options available, especially as there is a cost for integer numbers as well.
  • We should try to encapsulate the differences somehow to avoid splilling #if(def)s everywhere.
  • Can you quantify the maximum error introduced by the fast path solution?

@miloyip
Copy link
Collaborator

miloyip commented Sep 4, 2014

To keep both options, possible implementations:

  1. Macro with #if everywhere in single function.
  2. Macro to switch between two different functions. Double source lines.
  3. New parse option (template parameter), dispatch to two functions. Double source lines. (single function with if is difficult, because some local variables are only used by an implementation).

Seems 3 is more flexible and easier to test. I think that the function is quite difficult to be refactored... But I can try.

I am not sure how to calculate maximum error analytically. I can try to get empirical result with random values.

@pah
Copy link
Contributor

pah commented Sep 4, 2014

I think if EXPECT_DOUBLE_EQ passes for all random values on the fast path, this would be sufficient for many use cases.

I dislike (1) and I even more dislike code duplication. The "local variable" problem can be solved by a similar technique as done for the local copy of the stream: Have a small template class wrapped around the local variable and providing its operations, doing nothing in case of the fast-path-only solution. This way, the cost for the fast path is a single word on the stack. This should be acceptable.

If we go for the template parameter, we'd need a way to set default parse flags globally to avoid cluttering the user code when choosing a non-default configuration.

@miloyip
Copy link
Collaborator

miloyip commented Sep 4, 2014

If we go for the template parameter, we'd need a way to set default parse flags globally to avoid cluttering the user code when choosing a non-default configuration.

Any suggestion? Using macro as parameter default value?

@pah
Copy link
Contributor

pah commented Sep 4, 2014

Using macro as parameter default value?

Yes, this could work as a simple solution:

#ifndef RAPIDJSON_DEFAULT_PARSEFLAGS
#define RAPIDJSON_DEFAULT_PARSEFLAGS kParseNoFlags
#endif
enum ParseFlag {
    kParseNoFlags = 0,
// ....
    kParseDefaultFlags = RAPIDJSON_DEFAULT_PARSEFLAGS

... or something like this.

@miloyip
Copy link
Collaborator

miloyip commented Sep 5, 2014

Today I have refactored ParseNumber() and added kParseFullPrecision option.

Regarding to @pah 's comment

I think if EXPECT_DOUBLE_EQ passes for all random values on the fast path, this would be sufficient for many use cases.

I have added an experiment with random generated double (excluding denormals). By using the same notation of ULP as in google test, the result is:

[ RUN      ] Reader.ParseNumber_NormalPrecisionError
ULP Average = 0.382924, Max = 3 
[       OK ] Reader.ParseNumber_NormalPrecisionError (547 ms)

Since EXPECT_DOUBLE_EQ() considers two doubles are equal if their ULP <= 4, the result of maximum 3 ULP certainly passes EXPECT_DOUBLE_EQ().

@pah
Copy link
Contributor

pah commented Sep 5, 2014

Nice work! 👍

Now only the "default parse flags" are missing, but this is a separate issue, I guess.

@spl
Copy link
Contributor

spl commented Sep 15, 2014

For me, numerical correctness is more important than speed, but I do like the focus you have on performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants