Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Dragonbox algorithm for Float#to_s #10913

Merged
merged 17 commits into from
Aug 31, 2022

Conversation

HertzDevil
Copy link
Contributor

@HertzDevil HertzDevil commented Jul 9, 2021

This is a direct Crystal port of jk-jeon/dragonbox@b5b4f65 (with the full cache for Float64). It is faster than Grisu3, provides the shortest round-trip guarantee for all inputs, and never depends on LibC.snprintf; an example where Dragonbox works but Grisu3 fails is 4.91e-6 versus 4.9099999999999996e-06. Closes #8441 (according to the paper author Dragonbox is even faster than Ryu).

Simple benchmarks:

require "benchmark"

N = ENV["N"].to_i

struct Float
  def to_s_grisu : String
    # `#to_s` before this PR
  end

  def to_s_grisu(io : IO) : Nil
    # `#to_s(io)` before this PR
  end
end

macro test_case(alg, float)
  b.report("{{ alg.id }}({{ float }}, String)") do
    (N // 100).times do
      x = {{ float }}.new!(Random.rand)
      100.times do
        x.{{ alg == "Dragonbox" ? "to_s".id : "to_s_grisu".id }}
      end
    end
  end
end

macro test_case_io(alg, float)
  b.report("{{ alg.id }}({{ float }}, IO)") do
    io = IO::Memory.new
    (N // 100).times do
      x = {{ float }}.new!(Random.rand)
      100.times do
        x.{{ alg == "Dragonbox" ? "to_s".id : "to_s_grisu".id }}(io)
      end
    end
  end
end

Benchmark.ips do |b|
  test_case "Dragonbox", Float64
  test_case "Grisu3", Float64
  test_case_io "Dragonbox", Float64
  test_case_io "Grisu3", Float64
  test_case "Dragonbox", Float32
  test_case "Grisu3", Float32
  test_case_io "Dragonbox", Float32
  test_case_io "Grisu3", Float32
end
$ N=500000 bin/crystal run --release bm.cr
Dragonbox(Float64, String)  10.75  ( 93.02ms) (± 5.01%)  68.7MB/op   2.86× slower
   Grisu3(Float64, String)   7.24  (138.05ms) (± 2.73%)  68.7MB/op   4.24× slower
    Dragonbox(Float64, IO)  18.56  ( 53.88ms) (± 3.92%)  32.0MB/op   1.66× slower
       Grisu3(Float64, IO)  11.11  ( 90.05ms) (± 3.90%)  32.0MB/op   2.77× slower
Dragonbox(Float32, String)  12.59  ( 79.43ms) (± 4.81%)  83.0MB/op   2.44× slower
   Grisu3(Float32, String)   6.96  (143.59ms) (± 2.73%)  83.0MB/op   4.41× slower
    Dragonbox(Float32, IO)  30.72  ( 32.55ms) (± 4.39%)  16.0MB/op        fastest
       Grisu3(Float32, IO)  11.43  ( 87.47ms) (± 3.00%)  16.0MB/op   2.69× slower

These benchmarks are not very representative of real string conversion scenarios, so more comprehensive testing is needed. Also specs.

All the existing Grisu3-related modules (except Float::Printer::IEEE) are marked as deprecated. They are already :nodoc: though, since Float::Printer is. I don't imagine anyone would use those modules directly.

Comment on lines 110 to 113
pending_win32 "failure case" do
# grisu cannot do this number, so it should fall back to libc
it "grisu failure cases" do
test_pair 4.91e-6, "4.91e-6"
test_pair 3.5844466002796428e+298, "3.5844466002796428e+298"
end
Copy link
Contributor Author

@HertzDevil HertzDevil Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Dragonbox does not depend on LibC, this ought to pass on Windows too. (If our sprintf also depends on Dragonbox, then those strings in turn would become platform-independent as well.)

@HertzDevil
Copy link
Contributor Author

The test suite is now ported over from Microsoft's C++ STL. For simplicity there is a helper method that parses hexadecimal floating-point literals in the form of 0x1a2b.3c4dp+56.

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Jul 10, 2021

Here is a larger benchmark based on miloyip/dtoa-benchmark. It:

  • Generates 1000 floats each for every number of significant digits from 1 to 17, by producing a uniformly random UInt64, getting a reference string with LibC.snprintf, then converting it back to a Float64 with String#to_f64.
  • Converts those 1000 floats to strings, using Dragonbox and Grisu3.
  • Writes those 1000 floats to an IO::Memory, using Dragonbox and Grisu3.
  • Obtains the 1000 significands' decimal digits as UInt8 ASCII character sequences (these are the "raw" results). Dragonbox raw removes the trailing zeros (I didn't port this part from the C++ reference implementation), Dragonbox raw 2 doesn't. For Grisu3 raw the interface method already fills up the given StaticArray argument without any trailing zeros.

The times shown are the mean times taken to convert one Float64. The raw results highlight the algorithmic improvements; any differences between them and the String / IO results are the overheads associated with the shared code that actually gets a usable string to the user. We should optimize this shared part in Float::Printer#internal some time later.

The numbers here should not be compared to those in the linked benchmark directly; they simply dump the result string to a UInt8[256], and this is not replicated here.

dtoa

Also bear in mind none of this will affect sprintf and String#%, which currently always delegate to LibC.snprintf.

@HertzDevil HertzDevil marked this pull request as ready for review July 10, 2021 07:24
@straight-shoota straight-shoota added this to the 1.6.0 milestone Aug 30, 2022
@straight-shoota straight-shoota merged commit d192a97 into crystal-lang:master Aug 31, 2022
@HertzDevil HertzDevil deleted the perf/dragonbox branch September 1, 2022 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use Ryu algorithm for floating point to string conversation
2 participants