-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve HSV/HSL/HSI conversions (Fixes #378, #379) #407
Conversation
Codecov Report
@@ Coverage Diff @@
## master #407 +/- ##
==========================================
+ Coverage 80.38% 80.58% +0.19%
==========================================
Files 11 11
Lines 877 891 +14
==========================================
+ Hits 705 718 +13
- Misses 172 173 +1 Continue to review full report at Codecov.
|
5115134
to
43db7df
Compare
I haven't been able to get the full performance of the CPU for now, but there are no fatal performance regressions. So, after merging PR #406, we will be ready to merge this. |
1401264
to
be89a41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work. I'm especially impressed that you engineered the cost estimates for inlining; presumably you've discovered this already, but if not you're someone who would appreciate https://docs.julialang.org/en/latest/devdocs/inference/#The-inlining-algorithm-(inline_worthy)-1. For the remaining challenges, how much is due to me failing to finish JuliaLang/julia#30222?
src/utilities.jl
Outdated
@@ -1,13 +1,28 @@ | |||
# Helper data for CIE observer functions | |||
include("cie_data.jl") | |||
|
|||
# for optimization | |||
if Sys.ARCH !== :i686 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really cover us? fma
is really slow when implemented in software.
Perhaps we're OK though since it's a IEEE 754-2008 requirement, and 2008 was some time ago. https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation#Support
But I think that's when i686 was discontinued, I think, so I'd be surprised if this comes up very often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, people who use x86 processors which do not support FMA are probably not interested in speed. I have not done enough survey on ARM, but I think the high-end models support FMA.
Since Julia 1.5 is still ahead, it is possible to use muladd
at this time.
I will also look for workarounds other than fma
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked briefly, and while I don't know anything about ARM it does seem plausible that all recent models (meaning, less than 8 years old) support it. I'm good with this as the default for now, and if we get complaints we can always add new architectures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the original purpose of this if block is not the checking whether FMA is supported or not, but #379 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I'm happy with this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative workaround below does not work as expected:
_div60(x::T) where T = muladd(x, T(1/960), x * T(0x1p-6))
if _div60(90.0f0) == 1.5f0
div60(x::T) where T <: Union{Float32, Float64} = _div60(x)
else
# force two-step multiplication
div60(x::T) where T <: Union{Float32, Float64} = x * T(1/960) + x * T(0x1p-6)
end
even though, in REPL,
julia> VERSION
v"1.5.0-DEV.376"
julia> _div60(90.0f0)
1.5000001f0
julia> div60(90.0f0)
1.5f0
AFAIK, only on v1.5.0-DEV (Edit: and v1.4.0-rc2) _div60
does not work as expected. However, it is not only on v1.5.0-DEV that the if
block does not work as expected.(cf. JuliaMath/FixedPointNumbers.jl#131 (comment))
Perhaps the cause is that the stages of the constant propagation and the (LLVM IR and native) code generation are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a bug of Julia compiler that _div60
is drastically optimized. However, it seems that Julia drives that.
julia> @code_llvm _div60(90.0f0)
; @ REPL[1]:1 within `_div60'
; Function Attrs: uwtable
define float @julia__div60_17301(float) #0 {
top:
; ┌ @ float.jl:404 within `*'
%1 = fmul float %0, 1.562500e-02
; └
; ┌ @ float.jl:409 within `muladd'
%2 = fmul contract float %0, 0x3F51111120000000
%3 = fadd contract float %2, %1
; └
ret float %3
}
julia> @code_llvm div60(90.0f0)
; @ REPL[2]:5 within `div60'
; Function Attrs: uwtable
define float @julia_div60_17307(float) #0 {
top:
; ┌ @ float.jl:404 within `*'
%1 = fmul float %0, 0x3F51111120000000
%2 = fmul float %0, 1.562500e-02
; └
; ┌ @ float.jl:400 within `+'
%3 = fadd float %1, %2
; └
ret float %3
}
I don't know whether this is the expected behavior.
This adds the clamping and hue normalization for sources to HSx-->RGB conversions. This also adds the clamping for destinations to HSI-->RGB conversion.
_div60(x::T) where T = muladd(x, T(1/960), x * T(0x1p-6)) | ||
if reduce(max, _div60.((90.0f0,))) == 1.5f0 | ||
div60(x::T) where T <: Union{Float32, Float64} = _div60(x) | ||
else | ||
# force two-step multiplication | ||
div60(x::T) where T <: Union{Float32, Float64} = x * T(0x1p-6) + x * T(1/960) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the workaround for the muladd
problem with an ad-hoc measure. This works as expected "for now".
In the nightly build, the test may fail in the future. I'll think about that at that time. As long as the cause is clear, ===
in tests can be replaced with ≈
.
If there are no other problems, I will merge this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me!
I am very grateful for the sample code. I think it would be nice if a macro like BTW, I think the cost julia> f(x, mask) = (x & mask) != zero(x);
julia> tt = Tuple{UInt8, UInt8};
julia> mi = Base.method_instances(f, tt)[1];
julia> ci = code_typed(f, tt)[1][1]
CodeInfo(
1 ─ %1 = Base.and_int(x, mask)::UInt8
│ %2 = (%1 === 0x00)::Bool
│ %3 = Base.not_int(%2)::Bool
└── return %3
)
julia> opt = Core.Compiler.OptimizationState(mi, Core.Compiler.Params(typemax(UInt)));
julia> cost(stmt::Expr) = Core.Compiler.statement_cost(stmt, -1, ci, opt.sptypes, opt.slottypes, opt.params);
julia> cost(stmt) = 0;
julia> for c in ci.code; @show cost(c), c;end
(cost(c), c) = (1, :(Base.and_int(_2, _3)))
(cost(c), c) = (1, :(%1 === 0x00))
(cost(c), c) = (1, :(Base.not_int(%2)))
(cost(c), c) = (0, :(return %3)) |
I like the idea of a macro...maybe Also, I'm glad for your input about the costs themselves. The precise numbers I picked, while within stated ranges, were pretty arbitrary. Perhaps a better strategy would be to have an architecture-dependent tuning process, but I didn't go to that kind of effort. If there's a clear case for fixing some of them we should just do that. |
This is off-topic, but I implemented the |
I'm working on benchmarking and tuning. So, this is a draft PR now.I want to merge #406 first to check HSx-->RGB conversions.(Edit: Done)This fixes #378 and fixes #379.
This adds the clamping and hue normalization for sources to HSx-->RGB conversions. This also adds the clamping for destinations to HSI-->RGB conversion.
Despite the additional mechanism, at the expense of some accuracy, this speeds up the conversions with some parameter combinations.
Note the following change (cf. #379 (comment)):