-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write_* improvements #432
write_* improvements #432
Conversation
Don't convert numeric to character, use the max amount of precision nessesary to roundtrip doubles.
Sample output from grisu3 is identical to current and readr::write_tsv(head(mtcars), "/dev/stdout")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 21 6 160 110 3.9 2.62 16.46 0 1 4 4
#> 21 6 160 110 3.9 2.875 17.02 0 1 4 4
#> 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
#> 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
#> 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1 vs setting the precision manually readr::write_tsv(head(mtcars), "/dev/stdout")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 21 6 160 110 3.8999999999999999 2.6200000000000001 16.460000000000001 0 1 4 4
#> 21 6 160 110 3.8999999999999999 2.875 17.02 0 1 4 4
#> 22.800000000000001 4 108 93 3.8500000000000001 2.3199999999999998 18.609999999999999 1 1 4 1
#> 21.399999999999999 6 258 110 3.0800000000000001 3.2149999999999999 19.440000000000001 1 0 3 1
#> 18.699999999999999 8 360 175 3.1499999999999999 3.4399999999999999 17.02 0 0 3 2
#> 18.100000000000001 6 225 105 2.7599999999999998 3.46 20.219999999999999 1 0 3 1 |
One possible issue is mentioned in the grisu3 paper, which states the following (emphasis mine)
I need to look a this implementation and see what happens if the input is rejected, but it is reassuring it did not fail with the test inputs, which are random although only between 0-1. Edit |
We may also want to incorporate the changes from https://github.com/dvidelabs/flatcc/blob/master/include/flatcc/portable/grisu3_print.h#L207-L228 (also under Apache 2.0) which prefer 'unscientific' notation at the same length and always append a 0 on decimals. |
@@ -0,0 +1,361 @@ | |||
/* This file is part of an implementation of the "grisu3" double to string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include the license too? Might need to copy and paste from somewhere else
Also need to update Authors@R |
You mean C-level formatting or R-level formatting? But I'm happy with that performance - we don't need to be as fast as |
C-level formatting is what I meant (after the above changes). |
Apart from the authorship/license stuff (and news bullet), LGTM. Feel free to merge when you've done those bits. |
Added the license and authors to the DESCRIPTION. PTAL briefly to make sure it looks OK and then I can merge this. |
Looks good - I confirmed that Apache license is compatible with GPL3. |
The previous implementation as we know was very slow because of the character conversion, on par with
write.csv()
.9c28645 just does a little cleanup and turns off converting numeric to character first. This produces valid round trip-able results and is quite a bit faster than converting to character, however all numeric numbers are printed with the maximum amount of precision.
a67c1d5 uses the grisu3 implementation found at https://github.com/juj/MathGeoLib/blob/master/src/Math/grisu3.c. It is under the Apache 2 license so is safe for us to use. This actually gives us quite a bit better performance than the naive approach.
However
data.table:fwrite()
is still faster than any of these methods.I did some profile sampling with
R -q -d "valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes"
and the vast majority of our computational time is doing the formatting, so I am not sure how much more room there is to improve.Fixes #387