-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better sin_sum #88
better sin_sum #88
Conversation
Probably also worth switching out Lines 40 to 52 in a94bb7f
I haven't tested thoroughly if that section is the cause or not but this is related to #37 |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #88 +/- ##
==========================================
- Coverage 97.30% 96.12% -1.18%
==========================================
Files 20 20
Lines 2297 2299 +2
==========================================
- Hits 2235 2210 -25
- Misses 62 89 +27
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
To do that properly I would have to restructure this a little bit to make it possible to get the sin and cos sum simultaneously. |
Ya I think the error is coming before that anyway but it's been a while since I've looked at that function. This is great on my tests the accuracy was similar and was faster. I do think this function is generally useful enough to be in base because so many people naively will compute sin(x + y) which loses lots of precision.... but I'm obviously bias 😄 |
I understand where you're coming from, but the problem of errors not being composable isn't really solvable by adding random functions to base. |
Ps. Totally fine to merge whenever you are ready! |
I want to do some benchmarks on the 24 to 1e15 region first to make sure I haven't regressed anything. |
I also think I can get rid of the two separate inputs which would be a bit cleaner. |
I've done the benchmarking and cleanup that I wanted, so I think this is now good to go. The only thing I'm not 100% happy with in this version is that |
OK, I'm finally relatively happy with the Float32 version of the reduction. It still can lose some precision, but it's a lot better than before while maintaining relatively decent speed when you don't have large arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the Float32 implementations still use p = p * cos(xn + x)
. Should this be switched over to the better sum formula now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly. it will be a bit slower, so it probably needs a benchmark and accuracy plot to determine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can check it out tomorrow ! This weekend was busy with family but have time to look tomorrow more carefully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeesh - the besselj0 implementation was some of my first julia code which has been improved but the tests aren't very robust.
julia> Bessels.besselj0(328049.34f0)
-0.0013240778f0
julia> Bessels.besselj0(Float64(328049.34f0))
-0.0013258623831875669
I'm imagining the better sin sum here could help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I posted an issue at #90 to track that in a separate PR.
@oscardssmith I was fixing the Float32 case (which drastically improves accuracy) but was noticing that the garbage collector was running. # master
julia> @benchmark besselj0(x) setup=(x=rand()*50 + 15.0)
BenchmarkTools.Trial: 10000 samples with 995 evaluations.
Range (min … max): 28.635 ns … 1.378 μs ┊ GC (min … max): 0.00% … 94.85%
Time (median): 65.963 ns ┊ GC (median): 0.00%
Time (mean ± σ): 60.570 ns ± 44.076 ns ┊ GC (mean ± σ): 2.31% ± 3.27%
▇ ▄ ▂▇█▅▂ ▁▁▁ ▁▁▁ ▂
█▁▆▄▁█▁▄▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█████▇▆▃▃▁▁▄▁▁▇███▇▆▆▄▅████ █
28.6 ns Histogram: log(frequency) by time 93.2 ns <
Memory estimate: 0 bytes, allocs estimate: 0. Then the Float32 case I was actually getting allocations. # on my test branch
julia> @benchmark besselj0(x) setup=(x=Float32(rand()*50 + 15.0))
BenchmarkTools.Trial: 10000 samples with 982 evaluations.
Range (min … max): 60.338 ns … 1.372 μs ┊ GC (min … max): 0.00% … 95.17%
Time (median): 63.087 ns ┊ GC (median): 0.00%
Time (mean ± σ): 66.210 ns ± 48.384 ns ┊ GC (mean ± σ): 3.00% ± 3.91%
▁▄▅▇██▇▆▄▂▁ ▁▁ ▁▁▁ ▂
████████████▇▅▃▄▁▁▁▁▁▃▁▁▁▁▁▁▃▃▁▅█████▇▆▆▄▅▅▅▄▃▁▁▁▅▆▆██████▇ █
60.3 ns Histogram: log(frequency) by time 90.4 ns <
Memory estimate: 80 bytes, allocs estimate: 4. The benchmarks on tagged version # Bessels v0.2.8
julia> @benchmark besselj0(x) setup=(x=Float32(rand()*50 + 15.0))
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
Range (min … max): 15.738 ns … 28.151 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 16.074 ns ┊ GC (median): 0.00%
Time (mean ± σ): 16.054 ns ± 0.319 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▄▂ ▁▆█▄
▁▁▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▃▅███▅▃▁▁▁▁▁▁▂▂▂▃▄▅▆▇████▆▃▂▂▃▄▅▆▆▆▄▃▂ ▃
15.7 ns Histogram: frequency by time 16.2 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark besselj0(x) setup=(x=rand()*50 + 15.0)
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
Range (min … max): 16.520 ns … 32.577 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 16.916 ns ┊ GC (median): 0.00%
Time (mean ± σ): 17.940 ns ± 1.976 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▅ ▂ ▁ ▆
██▂█▇▁▁▁█▆▂▁▂▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▃▂▁▃ ▃
16.5 ns Histogram: frequency by time 22.6 ns <
Memory estimate: 0 bytes, allocs estimate: 0. Any idea what's going on? |
Presumably you introduced a type instability somewhere. |
Ill look closer but I get the allocation pulling out the using Base.Math: sin_kernel, cos_kernel, sincos_kernel, rem_pio2_kernel, DoubleFloat64, DoubleFloat32
function rem_pio2_sum(xs::Vararg{Float64})
n = 0
hi, lo = 0.0, 0.0
for x in xs
if abs(x) <= pi/4
s = x + hi
lo += (x - (s - hi))
else
n_i, y = rem_pio2_kernel(x)
n += n_i
s = y.hi + hi
lo += (y.hi - (s - hi)) + y.lo
end
hi = s
end
while hi > pi/4
hi -= pi/2
lo -= 6.123233995736766e-17
n += 1
end
while hi < -pi/4
hi += pi/2
lo += 6.123233995736766e-17
n -= 1
end
return n, DoubleFloat64(hi, lo)
end
|
This makes
sin_sum
work for summing an arbitrary number of elements and retain accuracy for large inputs (>1e15
), and uses this improvement to remove the edge cases inbessely
andbesselj
. Performance and accuracy information to come.