10-70x faster $float, roundtrip + correct rounding via dragonbox #18008

timotheecour · 2021-05-14T01:48:27Z

fixes

when true: # D20210513T184615
  from math import round
  let a = round(9.779999999999999, 2)
  doAssert a == 9.78
  echo a # before PR: 9.779999999999999, after PR: 9.78, like in python3

it also makes round actually useful for the purpose of formatting

dragonbox

replaces $float, addFloat with dragonbox algorithm
works in c, cpp, js (rt+vm) [2]
closes stdlib needs ryu for compact and performant float to string #13365 (dragonbox is faster than ryu [1])
10x-70x faster than current $float depending on float distribution, see nim r -d:danger tests/benchmarks/tstrfloats_bench
use -d:nimLegacyAddFloat for legacy addFloat/$, or use toStringSprintf directly
new low level module std/strfloats
zero impact on CT performance thanks to caching the object file (unless --forceBuild is passed)

new mechanism for managing cacheable dependencies

new (private, for now) module std/private/dependency_utils which allows specifying dependencies at CT, eg:

import private/dependency_utils
static: addDependency("dragonbox")
  # this will trigger compiler to build a dependency (or use a cached version)
  # and bundle it to the binary we're building, similar to `{.compile: "foo.c".}` but more flexible

this will allow in future work to add more "batteries" to stdlib without having to recompile them, and still allowing them to be used in low-level modules (compiler takes care of installation code which can be arbitrary); in particular for things like bigints (Add bigints to standard library #14696) for which I intend to use the same approach

design goals

no compromise on performance, no compromise on compile times
leverage existing software, don't re-invent the wheel (easily swappable to alternative libraries in future work (or even a pure nim re-implementation), or extensible to other libraries eg bigints)
transparent change for users (besides the changes in float formatting)

note

a key part of this PR is that it copies (with minor adjustments) https://github.com/abolz/Drachennest/blob/master/src/dragonbox.cc from this repo: https://github.com/abolz/Drachennest, which uses Boost Software License 1.0 : https://github.com/abolz/Drachennest/blob/master/LICENSE; that's the only caveat with this PR, but I believe this license should be fine. If not, we could provide compiler options to make this dependency optional (possibly even downloading the sources depending on compiler flags if we must avoid bundling it with nim repo); note that we already have a precedent for that though, with linenoise (bsd2), plus other dependencies past or present.

links

[1] see dragonbox vs ryu vs other benchmarks in https://github.com/jk-jeon/dragonbox and https://github.com/abolz/Drachennest
[2] for js at RT, node/browser already output a minimal representation, before this PR

Araq · 2021-05-14T07:05:37Z

While I don't mind the "new mechanism for managing cacheable dependencies" mechanism, in fact, I recently had similar ideas, but so that the Nim compiler can use Nimble packages, skimming over dragonbox_impl.cc -- c2nim eats this file for breakfast and then we would have a native Nim implementation. In the past adding C/C++ code didn't work well for us, not just because of the lack of dependency tracking.

timotheecour · 2021-05-14T07:58:40Z

While I don't mind the "new mechanism for managing cacheable dependencies" mechanism,

great, i think that's a real enabler to add runtime functionality without adding import/CT bloat;

in fact, I recently had similar ideas, but so that the Nim compiler can use Nimble packages,

yes, I'm also thinking in that direction; it's one way to break the cyclic dependency issue

skimming over dragonbox_impl.cc -- c2nim eats this file for breakfast

how?

nimble install c2nim
c2nim -v
0.9.14
c2nim --cpp --debug -o:build/dragonbox_impl_tmp.nim lib/std/private/dragonbox_impl.cc # IndexDefect

and from a local clone:

$nim_D/c2nim/c2nim --cpp --debug -o:build/dragonbox_impl_tmp.nim lib/std/private/dragonbox_impl.cc

/Users/timothee/git_clone/nim/c2nim/c2nim.nim(192) c2nim
/Users/timothee/git_clone/nim/c2nim/c2nim.nim(141) main
/Users/timothee/git_clone/nim/c2nim/c2nim.nim(76) parse
/Users/timothee/git_clone/nim/c2nim/cparse.nim(2737) parseUnit
/Users/timothee/git_clone/nim/c2nim/cparse.nim(2702) statement
/Users/timothee/git_clone/nim/c2nim/cparse.nim(2636) fullTemplate
/Users/timothee/git_clone/nim/c2nim/cparse.nim(1469) declaration
/Users/timothee/git_clone/nim/c2nim/cparse.nim(2192) compoundStatement
/Users/timothee/git_clone/nim/c2nim/cparse.nim(2712) statement
/Users/timothee/git_clone/nim/c2nim/cparse.nim(2028) declarationOrStatement
/Users/timothee/git_clone/nim/c2nim/cparse.nim(179) backtrackContext
/Users/timothee/git_clone/nim/Nim_devel/lib/system.nim(1888) pop
/Users/timothee/git_clone/nim/Nim_devel/lib/system/fatal.nim(53) sysFatal
Error: unhandled exception: index out of bounds, the container is empty [IndexDefect]

and then we would have a native Nim implementation. In the past adding C/C++ code didn't work well for us, not just because of the lack of dependency tracking.

I'm not against it if there's an automatic source to source translation (that doesn't incur performance loss), but note that even in that case, the addDependency would be the way to go IMO, to avoid having to recompile lots of code that should just be compiled once (like in lots of other languages which have a link time dependencies); it's great that nim doesn't need it for most things, but for things like this it's justified, and opens the door for similar future external dependencies.

Araq · 2021-05-14T09:18:25Z

how?

Well, it needs some c2nim bugfixes but it shouldn't be too hard.

planetis-m · 2021-05-14T10:54:27Z

Why not use the official repo: https://github.com/jk-jeon/dragonbox it has a header only version too. Does it work with float32?

timotheecour · 2021-05-15T05:50:19Z

Why not use the official repo: jk-jeon/dragonbox it has a header only version too. Does it work with float32?

the main reason is that it's a bit slower for float both in my experience [1] and according to @jk-jeon himself [2]
a 2nd reason is it requires C++17 instead of C++11 (or perhaps even just C++03, haven't tried with that), but in future c++17 requirement might be relaxed (tracked in C++11? jk-jeon/dragonbox#8)
a minor reason is that jk-jeon/dragonbox is more complex (in part because it has more features)

Note though that jk-jeon/dragonbox supports more options, and we can wrap it in future work to allow customizing (see policy options in jk-jeon/dragonbox, as well as float32 direct support); in fact I've done exactly that, in timotheecour#732 which wraps both, but we can discuss that after this PR is merged; the good news is that with the logic from this PR, adding new dependencies is easy, more on this later.

[1] see timotheecour#732 in which I wrap both jk-jeon/dragonbox and abolz/Drachennest; here's my benchmark which shows jk-jeon/dragonbox (aka toStringDragonbox0) is slower for double but faster for float32 compared to abolz/Drachennest (aka toStringDragonbox):

("toStringSprintf", "genFloatCast", 11.872696000000001)
("toStringSprintf", "genFloatConf", 1.5924309999999995)
("toStringSprintf", "genFloat32Cast", 5.178619000000001)
("toStringSprintf", "genFloat32Conf", 1.5908620000000013)
("toStringDragonbox", "genFloatCast", 0.16581199999999896)
("toStringDragonbox", "genFloatConf", 0.15630000000000166)
("toStringDragonbox", "genFloat32Cast", 0.22685300000000197)
("toStringDragonbox", "genFloat32Conf", 0.14983400000000202)
("toStringDragonbox0", "genFloatCast", 0.20322699999999827)
("toStringDragonbox0", "genFloatConf", 0.23087699999999955)
("toStringDragonbox0", "genFloat32Cast", 0.14654799999999923)
("toStringDragonbox0", "genFloat32Conf", 0.1509040000000006)

[2] see jk-jeon/dragonbox#8 (comment),

FYI, @abolz has reimplemented Dragonbox, and AFAIK this implementation does not have lots of compiler-torturing nonsenses mine has, so it is probably C++11 or C++03. He also has an arguably better decimal-to-string conversion routine, which requires a little bit more preconditions but has greater performance. You might be interested to check out the repository.

Well, it needs some c2nim bugfixes but it shouldn't be too hard.

I honestly doubt it, there'll always be C++ constructs it can't handle; while abolz/Drachennest is C++11 (or c++03), other projects we may want to depend on in future (eg bigint, etc) might have more complex requirements (eg jk-jeon/dragonbox uses C++17). Writing a C++ parser that works with real life libraries is not only incredibly difficult, but also entirely un-necessary, cling and clang's libtooling is what should be used for that. But that can be discussed elsewhere.

lib/system.nim

tests/benchmarks/tstrfloats_bench.nim

tests/stdlib/tjsonutils.nim

tests/stdlib/tstrfloats.nim

timotheecour · 2021-05-15T06:47:13Z

compiler/builddeps.nim

+      let objFile = dir / ("$1nimdragonbox.o" % prefix)
+      if optForceFullMake in conf.globalOptions or not objFile.fileExists:
+        # xxx
+        # let cppExe = getCompilerExe(c.config; compiler: TSystemCC; cfile: AbsoluteFile): string =


I'm currently hardcoding the C++ compiler; followup PR will improve this to use extccomp.getCompilerExe instead so it honors user configs (deferring this to future PR as it requires a few other changes to allow this and this PR is already quite big)

We should c2nim the code, as I said.

as I wrote above, I've tried that already; c2nim can't handle dragonbox.cc, it lacks a lot of functionality to support C++ files like dragonbox.cc and chokes on many different parts of the code (see also https://github.com/nim-lang/c2nim/issues).
Compiling dragonbox.cc directly as in this PR is much simpler than fixing c2nim to parse C++ properly and convert it to nim (only to be later converted back to C or C++ during cgen); fixing c2nim could easily represent a multi-year dedicated effort, time that could be better spent elsewhere.

planetis-m · 2021-05-15T09:00:50Z

Anyway if people start converting it to Nim like the last time with ryu, we should probable coordinate and split tasks.

abolz · 2021-05-15T10:47:11Z

Thanks for trying Drachennest! :-) Nice to see that this might actually be useful for someone else.

Just a few remarks:

The Dragonbox implementation in Drachennest currently only works correctly for double-precision (float64) numbers. float32 is actually not supported yet. (The double-precision implementation might produce more digits than necessary for single-precision numbers.)

So you should actually use Junekey Jeon's implementation (which currently requires C++20). Or the simplified implementation of Dragonbox in the fmt formatting library here (contributed by Junekey Jeon), which works with C++11, produces output similar to printf("%g"), is widely used and well tested.

Alternatively you might want to try out the Schubfach implementation (float32 version, float64 version). Dragonbox is an optimized version of this algorithm. It might be slightly slower than Dragonbox, but the implementation is less complex. (You may tweak the output using the MinFixedDecimalPoint/MaxFixedDecimalPoint constants in the ToChars methods to better match the output of printf.) If you want to try this out, I may try to simplify the code such that (the current version of) c2nim is able to translate it.

HTH

timotheecour · 2021-05-15T17:35:56Z

Thanks for trying Drachennest!

and thanks for writing this library!

float32 is actually not supported yet

indeed, as can be seen with: let a = 1.1'f32: when converting this to float64 first you'd get 1.100000023841858, whereas when using jk-jeon/dragonbox (eg via timotheecour#732) you'd get 1.1

But note that this isn't a regression, because nim currently renders float32's by first converting it to float64 (and then using sprintf, until this PR), so with this PR you still get all the benefits of dragonbox for float/float64 (arguably more common in nim code than float32), and you get possibly more digits than needed for float32's, but still better (or at least no worse) even in that case compared to before this PR.

Future work can add support for float32's, which will require other changes to nim anyways to avoid the pre-exixting conversions from float32 to float64 done in a few places.
(and will probably use https://github.com/abolz/Drachennest/blob/e6714a39ad331b4489d0b6aaf3968635bff4eb5e/src/schubfach_32.cc for the float32 case, and the code from this PR for the float64 case)

Anyway if people start converting it to Nim like the last time with ryu, we should probable coordinate and split tasks.

I don't see the point in doing that, you wouldn't get the benefits of future improvements from upstream libraries, and it'd have to be compiled in to object code anyways (as done in this PR) to avoid adding ever more import/cyclic dependencies just to get a working $float. If a suitable library written in nim comes along in the future that has no worse performance than the ones used in this PR, we'd always be able to swap it in at that time (as pre-compiled dependency), but until then I'd very much like to benefit from what those libraries offer without having to wait for this hypothetical rewrite with possibly worse performance.

timotheecour · 2021-05-16T01:58:56Z

@Araq PTAL

I have another branch (which depends on this PR) where it also applies dragonbox algorithm to float32 (using Drachennest-schubfach_32) in addition to float64 (still using Drachennest-dragonbox), along with performance benchmarks justifying it; it builds on this PR's addDependency logic, and gives the fastest performance possible + roundtrip + correct rounding in all scenarios: (c,cpp,js), (rt,ct), (float, float32), and only needs C++11 (possibly C++03) for compilation of the cached dependency (but nim c continues to work even though C++11 is used for the dependency).

Araq · 2021-05-22T09:11:09Z

c2nim can now parse dragonbox.cc and the way you did the "new mechanism for managing cacheable dependencies" via yet another compiler magic is not acceptable. We need an external "bundler" tool that is available at boot time (maybe by generating C sources for it).

timotheecour · 2021-05-25T22:59:29Z

@c2nim can now parse dragonbox.cc

(EDIT with --keepBodies) here's the output of c2nim --cpp --keepBodies --debug -o:output.nim lib/vendor/drachennest/dragonbox.cc as of latest c2nim, 98af60a3484696ab1b138789abc2f82c2ba733d0:

https://gist.github.com/timotheecour/8f45eddc5b3533f5e166b305fde537f9

definitely better than before.

But right now, c2nim doesn't produce a valid nim file, nim c output.nim complains, and in after trying to manually fix ~20 different kinds of errors i stopped because i had no idea how many more errors there would be, nor whether the produced nim file would work in the end.

if c2nim ever becomes capable of generating correct nim code (without performance loss) for dragonbox.cc, that's great, we'll always have the option to replace dragonbox.cc with c2nim_dragonbox.nim at that point, but until then, I don't see anything wrong with using dragonbox.cc directly as I did in this PR, so we can benefit from this PR's speedup and correctness improvements without waiting for c2nim.

and the way you did the "new mechanism for managing cacheable dependencies" via yet another compiler magic is not acceptable. We need an external "bundler" tool that is available at boot time (maybe by generating C sources for it).

That was my first thought, but doing this at boot time is strictly less flexible, and prevents optional, on-demand dependencies (bigint etc, for which i have a PR in the work using the same approach leveraging existing libraries).

addDependency("dragonbox") is analog to {.compile: "dragonbox.cc".} and {.link: "libdragonbox.so".}, and allowing it in user code instead of a separate pre-processing step gives you flexibility. In the future, once addDependency API is stabilized with more use cases (bigint etc), we can also expose it to user code in a more granular way, allowing to specify how to build a dependency.

yet another compiler magic

There are plenty of magics that could be replaced by regular procs (eg abs, etc), but this is a case where using a magic makes sense, as it allows using full power of stdlib without causing import dependencies for user code (hence usable in low level code like strmantle).

kunitoki · 2021-05-30T07:50:23Z

I agree with Araq we should port those algos to nim, so there is no need to rush a new dependency injection system in the compiler. It also has the benefit that you can use them both at runtime, compile time and in nimscript, while using the wrapper to dragonbox.cc you get different results in different contexts.

And i add that we should have both float32 and 64 versions.

timotheecour · 2021-05-30T09:42:04Z

I agree with Araq we should port those algos to nim

then please write the port and submit a PR after making sure the performance is comparable or better to what this PR offers? I don't see the point in re-inventing the wheel, this is NIH.

It also has the benefit that you can use them both at runtime, compile time and in nimscript

so, just like this PR then?

while using the wrapper to dragonbox.cc you get different results in different contexts.

how so?

And i add that we should have both float32 and 64 versions.

I have a followup branch that supports float32 as mentioned in #18008 (comment)

kunitoki · 2021-05-30T18:01:35Z

I'm not sure, it seems the usage of dragonbox is disabled for nimscript https://github.com/nim-lang/Nim/pull/18008/files#diff-f056834522efcf8e1e24c2c0cb2408cb1b1e6da5ac7a28129f514b89d3134394R17 but maybe it's just me being a newbie in nim compiler code...

I didn't want to sound harsh, but just wanted to say that having proper nim support could lead to simplified internals and better control on the compiler code.

timotheecour · 2021-05-30T18:10:06Z

I'm not sure, it seems the usage of dragonbox is disabled for nimscript #18008 (files) but maybe it's just me being a newbie in nim compiler code...

see for yourself: nim --eval:'var a = 0.1; echo (a+0.2)' shows 0.30000000000000004 (or try with a nimscript file) after this PR

simplified internals

nim should be able to handle external dependencies (and already does with pcre and all the existing wrappers); re-implementing existing libraries in nim doesn't reduce the overload workload/maintenance, the opposite is true.

kunitoki · 2021-05-30T18:16:53Z

Then what const useDragonbox = ... and not defined(nimscript) means ?

timotheecour · 2021-05-30T23:57:59Z

Then what const useDragonbox = ... and not defined(nimscript) means ?

it's a vmops, see registerCallback c, "stdlib.strfloats.addFloat", proc(a: VmArgs) =

* use dragonbox algorithm; alternative to #18008 * removed unsafe code

timotheecour · 2021-06-02T01:47:14Z

closing this because #18139 was merged, but see #18139 (comment) which shows a 1.4X performance drop compared to this PR

* use dragonbox algorithm; alternative to nim-lang#18008 * removed unsafe code

timotheecour force-pushed the pr_dragonbox branch from acbdb1b to e764bce Compare May 14, 2021 16:15

timotheecour mentioned this pull request May 14, 2021

nim cpp: tcpp_imported_exc, temitlist.nim, tempty_generic_obj.nim, recently fail on windows #18011

Closed

timotheecour force-pushed the pr_dragonbox branch 2 times, most recently from b1c5c7a to 5024b89 Compare May 14, 2021 22:37

timotheecour marked this pull request as ready for review May 15, 2021 05:50

timotheecour mentioned this pull request May 15, 2021

wrap dragonbox from https://github.com/jk-jeon/dragonbox timotheecour/Nim#732

Draft