Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perfect accuracy float parsing #536

Closed
dtolnay opened this issue Apr 21, 2019 · 50 comments
Closed

Perfect accuracy float parsing #536

dtolnay opened this issue Apr 21, 2019 · 50 comments

Comments

@dtolnay
Copy link
Member

dtolnay commented Apr 21, 2019

Float parsing is currently implemented by calculating the significand as u64, casting to f64, and then multiplying or dividing by a nonnegative power of 10. For example the input 10.9876543210987654 would be parsed into the value 109876543210987654_u64 as f64 / 10e15.

This algorithm is sometimes correct, or else usually close to correct in practical usage. It matches how JSON parsers are implemented in other languages.

However, it can happen that the result from this algorithm is not the mathematically nearest 64-bit floating point value to the exact value of the input. A "correct" algorithm would always produce the mathematically nearest answer. This requires high precision big-integer arithmetic in the general case so there would be a large performance cost; if implemented, we would likely want this behind a cfg that is off by default, with the current approximate behavior as default. This way programs can opt in to the more expensive algorithm as required.

fn main() {
    let input = "10.9876543210987654";
    let n: f64 = serde_json::from_str(input).unwrap();

    // produces 10.9876543210987644982878919108770787715911865234375
    // which is low by 9.017e-16
    let current_algorithm = 109876543210987654_u64 as f64 / 10e15;
    println!("{}", precise::to_string(current_algorithm));
    assert_eq!(n, current_algorithm);

    // produces 10.98765432109876627464473131112754344940185546875
    // which is high by 8.746e-16 (closer)
    let correct_answer = 10.9876543210987654_f64;
    println!("{}", precise::to_string(correct_answer));
    assert_ne!(n, correct_answer);
}
@Diggsey
Copy link
Contributor

Diggsey commented Jul 17, 2019

The python json library does get this right (or at least gets it more right that serde_json currently does) as I recently discovered when trying to replace a python service with one written in rust 😢

@sjackman
Copy link

@dtolnay Any update on this PR? I just ran into this issue today.

@bluenote10
Copy link

A stupid question: Why not use the standard library parse? It seems to have an accurate implementation.

if implemented, we would likely want this behind a cfg that is off by default, with the current approximate behavior as default.

Is this what is already available under the arbitrary_precision feature flag, or is it referring to something else?

Just in case it is open for debate, I would vote for making the exact version the default and making the approximate/optimized logic opt-in. Defaulting to lossy serialization/deserialization round-trips is a source of real headaches.

As an example: rust-geo-booleanop is modeled after an algorithm written in JS, and currently we are trying to fix some issues in both implementations. Test cases are specified as JSON. In theory the implementations should do exactly the same, but I was debugging weird differences for days. Considering Rust striving for correctness in general, I suspected the differences were caused on JS side. I really didn't expect that the test case input data gets modified during parsing on Rust side.

@Diggsey
Copy link
Contributor

Diggsey commented Jan 19, 2020

making the approximate/optimized logic opt-in.

If the mechanism for changing this option is feature-flags, then it makes more sense to opt-in to the higher precision option, even though you might think that should be the default. Otherwise, any dependency could trigger lossy precision and unintentionally break your application.

@rw
Copy link

rw commented Feb 1, 2020

Would it be possible to have the best of both worlds, by falling back to 100% correct parsing when an inaccuracy is detected? Perhaps by reserializing with ryu? This feature is important to my use case.

@rw
Copy link

rw commented Feb 1, 2020

Adding to @bluenote10's comment. It seems that the standard library's parse function is straightforward and probably good for this use case.

In fact, @dtolnay has committed to the stdlib file containing parse_decimal recently :-)

Edit: And the JSON parser in Rust's libserialize uses parse_decimal.

@rw
Copy link

rw commented Feb 3, 2020

Adding some more detail about how important this is for me: what I'm working on will be the source of truth for our users' data. As a result, we need to know that the parsing is lossless. I'm happy to help out with a PR if there's consensus on how to approach this.

@sjackman
Copy link

sjackman commented Feb 6, 2020

Did you see this WIP PR #541 ?

@rw
Copy link

rw commented Feb 6, 2020

@sjackman IMHO, because that PR takes the feature flag approach, I find it problematic. I think correct parsing should be the default behavior.

@sjackman
Copy link

sjackman commented Feb 6, 2020

I think correct parsing should be the default behavior.

I agree.

@Alexhuszagh
Copy link

Alexhuszagh commented Feb 6, 2020

As the author of rust-lexical, I would highly recommend not implementing your own correct float parser without exhaustive testing, since it's very easy to get it wrong.

I see one of 3 general approaches:
1). Pre-process the input to allow it to use Rust's dec2flt (str.parse::<F: Float>()).
2). Use a dedicated float parser that supports JSON floats (rust-lexical does this as of v0.6).
3). Copy a subset of Rust lexical's code (I would be willing to do this) to enable a slightly lighter solution, while ensuring you build on prior experience.

Any correct float parser requires:
1). Arbitrary-precision arithmetic (big-ints).
2). Fairly efficient slow-path routines.
3). Very fast routines in fast cases.
4). Complex error estimation for other situations.

It's easy to get this wrong, so I will gladly help others do this, if I can be of service.

To use the lexical approach, you would use the following code:

let format = lexical_core::NumberFormat::JSON;
let result = lexical_core::parse_format::<f64>("3.0e7", format);

@sjackman
Copy link

sjackman commented Feb 6, 2020

@Alexhuszagh Would you consider submitting a PR to use rust-lexical to parse floats? It could both help start a conversation, and give a SHA-1 for people to use that have an immediate need for correct parsing of floats in the interim. (I'm not a maintainer of serde-rs/json.)

@Alexhuszagh
Copy link

@ijl has already put a fair amount of work in both helping me implement JSON support and also submitted a PR for serde-json to use lexical in #541. I can check, but I believe this PR should suffice.

@rw
Copy link

rw commented Feb 6, 2020

@Alexhuszagh Are there differences between the byte-at-a-time approach and what #541 is doing? IIUC, using lexical as-is means keeping intermediate buffers around.

@Alexhuszagh
Copy link

Alexhuszagh commented Feb 6, 2020

@rw I haven't checked exactly, but the byte-at-a-time approach would still require an intermediate buffer, it just wouldn't be visible and would allow us to implement it in terms of io::Read. That may not have been your assumption originally, so let me explain. Also, there is no good way to get around this.

Iterator Types

Using C++-like nomenclature, I'm going to briefly introduce iterator types:

  • Input iterators, which can only be used once and allow forward iteration.
  • Forward iterators, which can be used multiple times and allow forward iteration.

Examples of input iterators in Rust include io::Bytes, while example of forward iterators include linked-lists, hashmaps, deques, vectors, etc (most of those support much more than just forward-iteration, but they support at least forward iteration).

Algorithm Approach

All Algorithms
1). Read as many significant digits into a 64-bit integer before overflowing.
2). Read the exponent, if present, into a 16-bit (or larger) integer, returning max or min on overflow.
3). Store the index where the integer and fraction started in the string, in case we need to use a slow algorithm.

Fast Algorithm
1). Check if the significant digits use 53-bits or less (for a double-precision float).
2). Check if the exponent can be represented exactly as a double-precision float [1e-22, 1e22].
3). If both conditions are satisfied, convert the significant digits to a float and the exponent value to a float, and return the product (IEEE754 floats guarantee the use of guard digits, so we won't lose any precision from multiplying 2 exact-values a single time).

Moderate Algorithm
If the fast algorithm cannot be used, fallback to the moderate algorithm.
1). Represent the float as an extended-precision float (80-bits, 64-bits for the significant digits, 16-bits for the exponent).
2). Normalize the significant digits (make sure the significant digits have a 1 in the most-significant bit).
3). Get the first 64 significant digits and the binary exponent from the actual decimal exponent using pre-calculated tables.
4). Multiply the significant digits, calculating both the significant digits and the binary exponent.
5). Calculate the accuracy of our estimate, by estimating the error that could occur due to this approximation.
6). If the error would not affect the significant digits, return the value as a float.

Slow Algorithms
If the moderate algorithm isn't sufficient, we must use a slow, correct algorithm. This is the step that requires an intermediate buffer, since we must have at least 1 more pass over the input digits. We have the value correct to within 1 ULP (unit of lead-precision).
1). Using our extended-precision float, calculate b, which is always less than or equal to the actual value.
2). Calculate the value of b+h, which is exactly halfway between b and b+1.
3). Calculate the exact value of the actual digits using a big-integer representation.
4). The steps differ here, due to various optimizations, but effectively, compare the actual digits present to the theoretical digits of b+h.
5). If the present digits are above b+h, return b+1, Otherwise, return b.

Summary

As you can see, there is no way to remove iterating over the input twice in the worst-case, so we have to have an intermediate buffer at some level. This also coincides with how readers typically work: they almost always use buffered-reads, just the buffer isn't user-visible (imagine making syscalls for every byte read, or a network request per-byte).

That means at some level, a buffer needs to be present to implement a correct and efficient parser. In order to avoid iterating twice over the input we would need to:
1). Only the slow algorithm, however, it would be 10-100x slower than the moderate algorithm, and slower still than the fast algorithm.
2). Have an incorrect parser, which only uses the fast or moderate algorithms, and is accurate to within 1 ULP.

We do, however, have 1 advantage: we know that only 767 decimal digits can contribute to a binary number with 53-bits of significant digits. For the theory behind this claim, please see Section 2.7 on radix conversions. This means we have an upper limit on the number of bytes required for both the input buffer, and the big-integer, allowing us to stack-allocate both.

@Alexhuszagh
Copy link

In short, any approach accepting an input iterator or similar object like io::Read would require a temporary buffer behind the scenes, just with a lot of constraints on that (which makes our life a lot easier).

@Diggsey
Copy link
Contributor

Diggsey commented Feb 6, 2020

@Alexhuszagh thanks for the link and the great explanation! However, I think your statement about 767 decimal digits and buffer sizes is misleading. The handbook says that 767 digits may be required to perfectly represent a 64-bit float in base 10, but parsing algorithms are doing the opposite conversion, and way fewer digits than that are required to unambiguously identify a 64-bit float.

A rough estimate (53 * log(2)) shows that you shouldn't require much more than about 16 (decimal) significant figures to fully determine the floating-point mantissa, and you could just ignore any further digits in the input value. That's small enough that buffering it in memory shouldn't really be a concern (if you are smart about what you keep in the buffer - eg. ignore leading zeros).

@Alexhuszagh
Copy link

Alexhuszagh commented Feb 6, 2020

@Diggsey You're doing it the wrong way, although you're getting fairly close (you have a very good intuition for this). Float to string only requires 17 digits to uniquely identify a float, but going string to float requires up to 767 digits because you can get close-to-halfway cases. However, this allows us to use a stack-allocated buffer regardless, since there is no dynamic memory allocation.

See some examples in here for examples of it (see C126, C127, C128 for this):
https://github.com/Alexhuszagh/rust-lexical/blob/master/data/test-parse-unittests/rust_parse_tests.toml

@Alexhuszagh
Copy link

Here are practical Python examples for those test cases, @Diggsey. As you can see, we need a max of 767 significant digits to parse a float, only 17 to write a float.

>>> x = "7.4109846876186981626485318930233205854758970392148714663837852375101326090531312779794975454245398856969484704316857659638998506553390969459816219401617281718945106978546710679176872575177347315553307795408549809608457500958111373034747658096871009590975442271004757307809711118935784838675653998783503015228055934046593739791790738723868299395818481660169122019456499931289798411362062484498678713572180352209017023903285791732520220528974020802906854021606612375549983402671300035812486479041385743401875520901590172592547146296175134159774938718574737870961645638908718119841271673056017045493004705269590165763776884908267986972573366521765567941072508764337560846003984904972149117463085539556354188641513168478436313080237596295773983001708984374999e-324"
>>> float(x)
5e-324

>>> y = "7.4109846876186981626485318930233205854758970392148714663837852375101326090531312779794975454245398856969484704316857659638998506553390969459816219401617281718945106978546710679176872575177347315553307795408549809608457500958111373034747658096871009590975442271004757307809711118935784838675653998783503015228055934046593739791790738723868299395818481660169122019456499931289798411362062484498678713572180352209017023903285791732520220528974020802906854021606612375549983402671300035812486479041385743401875520901590172592547146296175134159774938718574737870961645638908718119841271673056017045493004705269590165763776884908267986972573366521765567941072508764337560846003984904972149117463085539556354188641513168478436313080237596295773983001708984375e-324"
>>> float(y)
1e-323

>>> z = "7.4109846876186981626485318930233205854758970392148714663837852375101326090531312779794975454245398856969484704316857659638998506553390969459816219401617281718945106978546710679176872575177347315553307795408549809608457500958111373034747658096871009590975442271004757307809711118935784838675653998783503015228055934046593739791790738723868299395818481660169122019456499931289798411362062484498678713572180352209017023903285791732520220528974020802906854021606612375549983402671300035812486479041385743401875520901590172592547146296175134159774938718574737870961645638908718119841271673056017045493004705269590165763776884908267986972573366521765567941072508764337560846003984904972149117463085539556354188641513168478436313080237596295773983001708984375001e-324"
>>> float(z)
1e-323

@Diggsey
Copy link
Contributor

Diggsey commented Feb 6, 2020

Ah I was forgetting about such edge cases - I would be glad if serde_json would just losslessly round-trip all 64-bit floats, which seems like a much easier problem.

@dtolnay
Copy link
Member Author

dtolnay commented Feb 7, 2020

I agree with @Diggsey, I am more interested in round tripping correctly by default and a lot less interested in perfect parsing of arbitrarily long decimal representations by default, as I don't see that as a common practical requirement that would be worth paying for in compile time by default. We can have an option but it would be off by default if it doesn't come for free from perfect round tripping.

In particular I don't think I am on the same page as @Alexhuszagh right now, somewhere along the work in the perfect_float PR it appears to have gotten further away from correctly round tripping things (#541 (comment)).

@Alexhuszagh
Copy link

Alexhuszagh commented Feb 7, 2020

@dtolnay See the comments addressed to that, I'm not entirely sure what went on in that example, but it doesn't reproduce for me anymore.

As for round-tripping 17-digit floats, it's definitely possible using only the fast and moderate paths described above in most cases (there are some edge-cases with exponents that will cause this to fail, actually). Unfortunately, arbitrary-precision arithmetic is pretty much required to fully round-trip every value, which also means it supports arbitrarily-long inputs.

@Alexhuszagh
Copy link

For example, the following 3 tests are good test-cases on why you need support for arbitrarily-long inputs necessarily:

>>> float("2.4703282292062326e-324")
0.0
>>> float("2.4703282292062327e-324")
0.0
>>> float("2.4703282292062328e-324")
5e-324

This is because although only 17 digits are present, which is enough to uniquely identify a float, the exponent being base-10 means we get approximation errors for close-to-halfway values. A side-effect is we need internally to use arbitrary-precision arithmetic, which also allows us to support arbitrarily long inputs. The two, sadly, are linked.

@rw
Copy link

rw commented Feb 7, 2020

@dtolnay said

...I don't see that as a common practical requirement that would be worth paying for in compile time.

Personally, I'd be happy to have a longer compile time if this was available. I guess it would be like a restricted version of the arbitrary-precision arithmetic feature?

@Alexhuszagh
Copy link

I do have code for 128-bit mantissa using the moderate path that should be able to handle round-trip conversions, however, it provided a fairly meagre performance benefit when used, and was a lot slower when the slow path was required. However, it should handle any case of 17-19 (likely, anything 30 or less) digits to fully round-trip.

@dtolnay
Copy link
Member Author

dtolnay commented Feb 7, 2020

I think we are still not on the same page. :( I don't see how the examples in #536 (comment) are relevant to round tripping f64 -> json -> f64. 2.4703282292062328e-324 is not a json number that you would get by serializing an f64; the nearest f64 serializes to 5e-324 which is far easier to parse. Parsing arbitrary 17-19 digit values is not a requirement right now, I would recommend focusing on f64 -> json -> f64 round trip and getting the best compile time performance there.

@Alexhuszagh
Copy link

If that's your goal, I can gift you code that will do this. The moderate path with be more than sufficient for this. I won't implement it personally, however.

@Alexhuszagh
Copy link

I can submit a PR that would do this. It will take me some time, but it's stable code, and it will round-trip clear representations of values. I don't personally like the approach (it will get certain inputs wrong, which may be present from decimals or other representations) so I'm not willing to modify lexical to do this.

Let me know if that works?

@dtolnay
Copy link
Member Author

dtolnay commented Feb 7, 2020

Thanks! That would be great. The things I would like to know to move this issue forward, from you or someone else, are:

  • If the requirement is just that f32 and f64 round trips always preserve the correct value,
    • how much is it going to cost us in compile time?
    • how much does it cost in runtime compared to the current approximate parsing, on some reasonable distribution of inputs, maybe a worse case and an expected case on random floats?
  • If the requirement is accurately parsing arbitrary digits, even ones that would not be emitted by serializing a float,
    • the same two things.

@Alexhuszagh
Copy link

@dtolnay Compile-time performance should be fairly minimal. Run-time performance should be fairly minimal as well. Almost all of the overhead comes from the arbitrary-precision work. Assuming you have the code already to parse the significant digits and the exponent in a float (which I'm assuming you do), then the remaining requirements are basically this:

1). You need an extended-precision float class.
2). You need an algorithm to round-nearest, tie-even (like 20 lines of simple code).
3). You need an algorithm to normalize the extended-precision float, and then multiple normalized extended-precision floats.
4). You need a fast-path algorithm (~20 lines of simple code).
5). You need some helpers and trait methods.

For the other requirement, a lot, lot more. Especially for parsing complex inputs. Lexical is not optimized for compile-times, and for good reason.

Let me get you a simple proof of concept, and I can show you performance is good in pretty much all cases. I can give you numbers right now, but it would likely be better-done in-context.

@dtolnay
Copy link
Member Author

dtolnay commented Feb 7, 2020

For the other requirement, a lot, lot more. Especially for parsing complex inputs. Lexical is not optimized for compile-times, and for good reason.

Thanks, this was my hunch, and why I insisted on distinguishing between the requirements even if lexical isn't interested in supporting the simpler one. I know that people who care about accuracy at all costs are overrepresented among those who choose to follow this thread, but in the broader serde_json user base compile time matters a whole lot.

@Alexhuszagh
Copy link

@dtolnay Fair. I'll have a working example shortly, it's not optimized for speed yet (I've stripped out every single inline, and just have stripped out almost every trait and assumption), which will be published here. I'll license this code however you wish, as long as you give me appropriate credit (just a blurb in the source file).

@dtolnay
Copy link
Member Author

dtolnay commented Feb 7, 2020

Sounds good, I'd be happy to give credit in the source file. Thanks for your help!

@Alexhuszagh
Copy link

@dtolnay I've pushed the initial commit. Everything is done through the parse_float method, but it should give you a pretty good idea of how minimal the requirements are:
https://github.com/Alexhuszagh/minimal-lexical

It's < 800 lines of code, no macros (except for trait impl). Let me add documentation so it makes some sense, and then a README, and a few working examples and I'll submit a PR to integrate it in.

@Alexhuszagh
Copy link

Alexhuszagh commented Feb 7, 2020

Ok, it's live. Let me know if everything looks good, and if so, I'll submit a PR with this as the default float parser. I'll add more comprehensive tests in the actual composite, but every piece is working identically to its behavior in rust-lexical, which makes it unlikely any errors are present.

Please note this doesn't actual handle parsing the significant digits from value, or the exponent, but those require fairly trivial string-to-integer conversions (using checked_add and checked_mul to exit early on overflow while preserving the value), so it makes minimal assumptions (IE, it's very easy to integrate into a parser requiring io::Read). This just ensures an accurate float is generated from that value (given the conditions mentioned above), with fairly minimal overhead.

@dtolnay
Copy link
Member Author

dtolnay commented Feb 8, 2020

@Alexhuszagh Nice!

I ran the following loop as a smoke test. Is it fair to expect that this loop should run forever? I believe termination would indicate a bug in either ryu or minimal-lexical.

let mut rng = XorShiftRng::from_seed([0; 16]);
let mut buffer = ryu::Buffer::new();
loop {
    let input = f64::from_bits(rng.next_u64());
    if input.is_finite() {
        let printed = buffer.format_finite(input);
        let (output, rest) = parse_float::<f64>(printed.as_bytes());
        assert_eq!(output, input);
        assert_eq!(rest, b"");
    }
}

However it fails on the following value, and the failure reproduces using std::fmt instead of ryu, which suggests this is more likely a bug in minimal-lexical. Any idea what could be going on?

let expected = -0.04628372940652459_f64;
println!("expected: {}", expected);
let mantissa = 4628372940652459_u64;
let exponent = -17_i32;
let truncated = false;
let actual = -minimal_lexical::create_float::<f64>(mantissa, exponent, truncated);
println!("actual:   {}", actual);
assert_eq!(actual, expected);

@Alexhuszagh
Copy link

Alexhuszagh commented Feb 8, 2020

Ok, this is patched, thank you. I forgot (hard-earned lessons) that powi is not stable for floats, even small powers that can be exactly represented. Rust-lexical uses hard-coded constants for these small powers (10^0 to 1^10 for f32, and 10^0 to 10^22 for f64). This has since been fixed. I'll run the smoke-screen, and get back to you once everything passes.

@Alexhuszagh
Copy link

Alexhuszagh commented Feb 8, 2020

@dtolnay I am finding one real-world example where this round-trip fails, and this won't be solvable without the slow-path algorithm.

The failing test case is:

let expected = 2.638344616030823e-256_f64;
let mantissa = 2638344616030823_u64;
let exponent = -271_i32;
let truncated = false;
let actual = -minimal_lexical::create_float::<f64>(mantissa, exponent, truncated);
assert_eq!(actual, expected);

This value is actually a halfway point, which is why we can't differentiate it. Here are the examples printed with numerous extra digits, just to show the problem:

b  = 2.638344616030822852118090731545e-256
b1 = 2.638344616030823147881137273239e-256
# Estimated, since this cannot be a real float
bh = 2.638344616030822999999614002392e-256

A quick check shows the moderate path in rust-lexical fails here, because it's very close to a halfway value. To fix this, we would need a slow-path algorithm. Ultimately, this is your call. It works with the longer formatting printed out to 17-19 digits:

let mantissa = 2638344616030823147_u64;
let exponent = -274_i32;
let truncated = false;
let actual = -minimal_lexical::create_float::<f64>(mantissa, exponent, truncated);

However, any modern float formatting algorithm (like Ryu, Grisu3, etc.) will likely format it almost exactly on the halfway point, which means this is a real problem in practice.

@dtolnay
Copy link
Member Author

dtolnay commented Feb 8, 2020

What is the minimum compile time required to handle cases like 2.638344616030823e-256? Is it effectively equivalent to compiling lexical-core with "correct" feature enabled? On my machine that corresponds to a 45% increase in CPU-seconds to compile serde_json which is not going to be something that we would do by default.

@Alexhuszagh
Copy link

Alexhuszagh commented Feb 8, 2020

I can provide a much smaller subset, so no, it won't be that large. A majority of the compile-time bloat from lexical is the presence of features to parse non-decimal strings, non-float-parsing conversions, and also support for arbitrary formats including decimal separators, etc. I can add in the slow-path algorithms and the compile-time overhead will be much lower than lexical itself, however, the actual extraction of the significant float digits and validation of the float format will have to be done by the client.

Also, I don't have to use all my slow path algorithms either, which will cut down compile-time even further. I currently have 3 slow-path algorithms, 2 of which are used for inputs <= 1000 characters, and one which is used for >= 1000 characters (non-decimal floats only). Since we only need some of these, it cuts back even further.

A sample signature would be:

fn parse_float<'a, F, Iter1, Iter2>(integer: Iter1, fraction: Iter2, exponent: i32) -> F
    where F: Float,
          Iter1: Iterator<Item=&'a u8>,
          Iter2: Iterator<Item=&'a u8>;

This way, all the parsing logic and validation of the float format can be handled with minimal overhead by the library using the minimal version.

If this is the case, I would likely spin out minimal-lexical into a published crate, rather than a stub of one, because the amount of code will be a fair bit larger, and I would want to ensure any regressions get fixed immediately.

@Alexhuszagh
Copy link

Ok, @dtolnay, if you want to test minimal-lexical and the parse times, the updated version is now up.
https://github.com/Alexhuszagh/minimal-lexical

The API has somewhat changed, but see the examples provided for how to use it. I've also integrated your smoke test in, which seems to be passing now. It's <3,000 lines of code (the full version of lexical is ~40,000, because supporting different float formats is a complete and utter nightmare, so compile-times are slow, even if the binary is fairly small and fast), compiles quickly, and all my simple stress tests seem to pass.

I'm going to add in Rust's tests for comprehensive float parsing and well-known edge-cases in float parsers:
https://github.com/Alexhuszagh/rust-lexical/tree/master/data

If all that passes, and the smoke test runs for more than an hour, I'll submit a PR to integrate this in.

@rw
Copy link

rw commented Feb 8, 2020

@Alexhuszagh Would it be useful to exhaustively test all f32 values?

@Alexhuszagh
Copy link

@rw If you want, sure. The unittests are quite comprehensive, and I'm mostly looking for regressions introduced in extracting the code from the greater lexical project, but more coverage is always good.

@Alexhuszagh
Copy link

Alexhuszagh commented Feb 9, 2020

Ok, everything works:
1). The smoke test has run for > 30 minutes, without any failures.
2). We've exhaustively tested conversion to/from f32 using ryu and minimal-lexical.
3). The comprehensive-float-test from Rust's libcore works.
4). We've tested the code works on numerous different architectures and OSes.

I'll wait for @dtolnay to play with it a bit, see if he can break it, see if the compile times are fast enough, and then I'll submit a PR.

Finally, the build times are quite nice now:
build_time

Unit Total Codegen Features
1. minimal-lexical v0.1.0 1.1s 0.2s (20%) default, std
2. arrayvec v0.4.12 1.0s 0.0s (2%) array-sizes-33-128, default, std
3. arrayvec v0.4.12 custom-build 0.7s 0.0s (0%) array-sizes-33-128, default, std
4. minimal-lexical v0.1.0 custom-build 0.4s 0.0s (0%) default, std
5. nodrop v0.1.14 0.2s 0.0s (15%)
6. cfg-if v0.1.10 0.1s 0.0s (24%)
7. arrayvec v0.4.12 custom-build (run) 0.1s 0.0s (0%) array-sizes-33-128, default, std
8. minimal-lexical v0.1.0 custom-build (run) 0.0s 0.0s (1%) default, std

@dtolnay
Copy link
Member Author

dtolnay commented Feb 9, 2020

Could you see if it's possible to strip this down further to bring down the compile time? I know you said the crate this is mostly coming from was never optimized for compile time, but now would be the time to do that. In particular I am not excited about bringing in new transitive dependencies on nodrop, arrayvec, and cfg-if.

@Alexhuszagh
Copy link

Alexhuszagh commented Feb 9, 2020

Would you be fine with dynamic allocation for the big-integer backing store? Otherwise, arrayvec will be a requirement (vec or arrayvec is a necessity). I did have a previous crate, which was essentially a drop-in replacement but I removed it because arrayvec was better maintained.

cfg_if can be trivially replaced, but it's lightweight, so I doubt that will significantly change compile times. However, from a dependency management standpoint, I get it, and can remove it. As for nodrop, it's a requirement for arrayvec until v0.5, which unfortunately only supports Rustc 1.37+.

@Alexhuszagh
Copy link

Ok, I've removed both dependencies by default. Arrayvec can be re-enabled by the feature no_alloc, which then means the entire crate will not allocate. As a result, it uses Vec as a backing store by default now.

@rw
Copy link

rw commented Mar 26, 2020

@Alexhuszagh @dtolnay Is there anything I can do to help?

@japaric
Copy link
Contributor

japaric commented May 13, 2020

I have opened a new PR (#671) that uses the latest version of minimal-lexical to fix this issue. The PR description has quite a few measurements but the summary is that I'm (only) seeing an increase in compile times of ~2-3% (# instructions as reported by perf) or ~0.3s in wall time (out of an initial ~15s) for builds from scratch of the serde-json crate.

@dtolnay
Copy link
Member Author

dtolnay commented Jun 9, 2020

The new float parser landed behind a feature in 1.0.54 -- https://github.com/serde-rs/json/blob/v1.0.54/Cargo.toml#L58-L65. It's possible it will be default in a future version.

serde_json = { version = "1.0.54", features = ["float_roundtrip"] }

@dtolnay dtolnay closed this as completed Jun 9, 2020
@serde-rs serde-rs locked and limited conversation to collaborators May 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging a pull request may close this issue.

7 participants