Implement the Dragonbox algorithm for `Float#to_s` #10913

HertzDevil · 2021-07-09T05:34:01Z

This is a direct Crystal port of jk-jeon/dragonbox@b5b4f65 (with the full cache for Float64). It is faster than Grisu3, provides the shortest round-trip guarantee for all inputs, and never depends on LibC.snprintf; an example where Dragonbox works but Grisu3 fails is 4.91e-6 versus 4.9099999999999996e-06. Closes #8441 (according to the paper author Dragonbox is even faster than Ryu).

Simple benchmarks:

require "benchmark"

N = ENV["N"].to_i

struct Float
  def to_s_grisu : String
    # `#to_s` before this PR
  end

  def to_s_grisu(io : IO) : Nil
    # `#to_s(io)` before this PR
  end
end

macro test_case(alg, float)
  b.report("{{ alg.id }}({{ float }}, String)") do
    (N // 100).times do
      x = {{ float }}.new!(Random.rand)
      100.times do
        x.{{ alg == "Dragonbox" ? "to_s".id : "to_s_grisu".id }}
      end
    end
  end
end

macro test_case_io(alg, float)
  b.report("{{ alg.id }}({{ float }}, IO)") do
    io = IO::Memory.new
    (N // 100).times do
      x = {{ float }}.new!(Random.rand)
      100.times do
        x.{{ alg == "Dragonbox" ? "to_s".id : "to_s_grisu".id }}(io)
      end
    end
  end
end

Benchmark.ips do |b|
  test_case "Dragonbox", Float64
  test_case "Grisu3", Float64
  test_case_io "Dragonbox", Float64
  test_case_io "Grisu3", Float64
  test_case "Dragonbox", Float32
  test_case "Grisu3", Float32
  test_case_io "Dragonbox", Float32
  test_case_io "Grisu3", Float32
end

$ N=500000 bin/crystal run --release bm.cr
Dragonbox(Float64, String)  10.75  ( 93.02ms) (± 5.01%)  68.7MB/op   2.86× slower
   Grisu3(Float64, String)   7.24  (138.05ms) (± 2.73%)  68.7MB/op   4.24× slower
    Dragonbox(Float64, IO)  18.56  ( 53.88ms) (± 3.92%)  32.0MB/op   1.66× slower
       Grisu3(Float64, IO)  11.11  ( 90.05ms) (± 3.90%)  32.0MB/op   2.77× slower
Dragonbox(Float32, String)  12.59  ( 79.43ms) (± 4.81%)  83.0MB/op   2.44× slower
   Grisu3(Float32, String)   6.96  (143.59ms) (± 2.73%)  83.0MB/op   4.41× slower
    Dragonbox(Float32, IO)  30.72  ( 32.55ms) (± 4.39%)  16.0MB/op        fastest
       Grisu3(Float32, IO)  11.43  ( 87.47ms) (± 3.00%)  16.0MB/op   2.69× slower

These benchmarks are not very representative of real string conversion scenarios, ~~so more comprehensive testing is needed. Also specs~~.

All the existing Grisu3-related modules (except Float::Printer::IEEE) are marked as deprecated. They are already :nodoc: though, since Float::Printer is. I don't imagine anyone would use those modules directly.

HertzDevil · 2021-07-09T06:08:34Z

spec/std/float_printer_spec.cr

-  pending_win32 "failure case" do
-    # grisu cannot do this number, so it should fall back to libc
+  it "grisu failure cases" do
+    test_pair 4.91e-6, "4.91e-6"
    test_pair 3.5844466002796428e+298, "3.5844466002796428e+298"
  end


Since Dragonbox does not depend on LibC, this ought to pass on Windows too. (If our sprintf also depends on Dragonbox, then those strings in turn would become platform-independent as well.)

HertzDevil · 2021-07-09T20:44:57Z

The test suite is now ported over from Microsoft's C++ STL. For simplicity there is a helper method that parses hexadecimal floating-point literals in the form of 0x1a2b.3c4dp+56.

HertzDevil · 2021-07-10T07:24:47Z

Here is a larger benchmark based on miloyip/dtoa-benchmark. It:

Generates 1000 floats each for every number of significant digits from 1 to 17, by producing a uniformly random UInt64, getting a reference string with LibC.snprintf, then converting it back to a Float64 with String#to_f64.
Converts those 1000 floats to strings, using Dragonbox and Grisu3.
Writes those 1000 floats to an IO::Memory, using Dragonbox and Grisu3.
Obtains the 1000 significands' decimal digits as UInt8 ASCII character sequences (these are the "raw" results). Dragonbox raw removes the trailing zeros (I didn't port this part from the C++ reference implementation), Dragonbox raw 2 doesn't. For Grisu3 raw the interface method already fills up the given StaticArray argument without any trailing zeros.

The times shown are the mean times taken to convert one Float64. The raw results highlight the algorithmic improvements; any differences between them and the String / IO results are the overheads associated with the shared code that actually gets a usable string to the user. We should optimize this shared part in Float::Printer#internal some time later.

The numbers here should not be compared to those in the linked benchmark directly; they simply dump the result string to a UInt8[256], and this is not replicated here.

Also bear in mind none of this will affect sprintf and String#%, which currently always delegate to LibC.snprintf.

src/float/printer.cr

HertzDevil added 2 commits July 9, 2021 13:31

Implement the Dragonbox algorithm for Float#to_s

36bbeca

fix some specs

2b1f41e

HertzDevil commented Jul 9, 2021

View reviewed changes

straight-shoota added kind:feature topic:stdlib:numeric labels Jul 9, 2021

HertzDevil added 2 commits July 10, 2021 04:37

bug fixes

0870769

rewrite float printer specs

0551371

make all internal modules private, move cache to own file

37d8d28

HertzDevil marked this pull request as ready for review July 10, 2021 07:24

test also IO overloads

6094255

HertzDevil mentioned this pull request Jul 14, 2021

Fix Float#humanize for values outside 1e-4...1e15 #10881

Merged

HertzDevil added 9 commits August 12, 2021 20:34

Merge remote-tracking branch 'upstream/master' into perf/dragonbox

199065a

Merge remote-tracking branch 'upstream/master' into perf/dragonbox

db7f191

use assert_prints

0275c08

Merge remote-tracking branch 'upstream/master' into perf/dragonbox

e0babe7

Merge remote-tracking branch 'upstream/master' into perf/dragonbox

ee7e6a4

Merge remote-tracking branch 'upstream/master' into perf/dragonbox

99ba2bf

use updated hexfloat parser

0df87d6

Merge remote-tracking branch 'upstream/master' into perf/dragonbox

39e2b8d

Merge remote-tracking branch 'upstream/master' into perf/dragonbox

22c44fe

HertzDevil mentioned this pull request Mar 31, 2022

Functions that depend on the current C locale #11952

Open

Merge remote-tracking branch 'upstream/master' into perf/dragonbox

b0f9326

straight-shoota reviewed Aug 29, 2022

View reviewed changes

src/float/printer.cr Outdated Show resolved Hide resolved

fixup

1a19faa

straight-shoota approved these changes Aug 30, 2022

View reviewed changes

straight-shoota added this to the 1.6.0 milestone Aug 30, 2022

straight-shoota merged commit d192a97 into crystal-lang:master Aug 31, 2022

HertzDevil deleted the perf/dragonbox branch September 1, 2022 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the Dragonbox algorithm for `Float#to_s` #10913

Implement the Dragonbox algorithm for `Float#to_s` #10913

HertzDevil commented Jul 9, 2021 •

edited

Loading

HertzDevil Jul 9, 2021 •

edited

Loading

HertzDevil commented Jul 9, 2021

HertzDevil commented Jul 10, 2021 •

edited

Loading

Implement the Dragonbox algorithm for Float#to_s #10913

Implement the Dragonbox algorithm for Float#to_s #10913

Conversation

HertzDevil commented Jul 9, 2021 • edited Loading

HertzDevil Jul 9, 2021 • edited Loading

Choose a reason for hiding this comment

HertzDevil commented Jul 9, 2021

HertzDevil commented Jul 10, 2021 • edited Loading

Implement the Dragonbox algorithm for `Float#to_s` #10913

Implement the Dragonbox algorithm for `Float#to_s` #10913

HertzDevil commented Jul 9, 2021 •

edited

Loading

HertzDevil Jul 9, 2021 •

edited

Loading

HertzDevil commented Jul 10, 2021 •

edited

Loading