Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve HSV/HSL/HSI conversions (Fixes #378, #379) #407

Merged
merged 1 commit into from
Mar 4, 2020

Conversation

kimikage
Copy link
Collaborator

@kimikage kimikage commented Feb 28, 2020

I'm working on benchmarking and tuning. So, this is a draft PR now. I want to merge #406 first to check HSx-->RGB conversions. (Edit: Done)

This fixes #378 and fixes #379.
This adds the clamping and hue normalization for sources to HSx-->RGB conversions. This also adds the clamping for destinations to HSI-->RGB conversion.

Despite the additional mechanism, at the expense of some accuracy, this speeds ​​up the conversions with some parameter combinations.

Note the following change (cf. #379 (comment)):

julia> RGB(1,0,0)|>HSV
HSV{Float32}(0.0f0,1.0f0,1.0f0)

@codecov
Copy link

codecov bot commented Feb 28, 2020

Codecov Report

Merging #407 into master will increase coverage by 0.19%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #407      +/-   ##
==========================================
+ Coverage   80.38%   80.58%   +0.19%     
==========================================
  Files          11       11              
  Lines         877      891      +14     
==========================================
+ Hits          705      718      +13     
- Misses        172      173       +1     

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c08ccb...ea196bb. Read the comment docs.

@kimikage kimikage force-pushed the hsx branch 2 times, most recently from 5115134 to 43db7df Compare February 28, 2020 14:55
@kimikage kimikage changed the title [WIP] Improve HSV/HSL/HSI conversions (Fixes #378, #379) Improve HSV/HSL/HSI conversions (Fixes #378, #379) Feb 28, 2020
@kimikage
Copy link
Collaborator Author

I haven't been able to get the full performance of the CPU for now, but there are no fatal performance regressions. So, after merging PR #406, we will be ready to merge this.

@kimikage kimikage force-pushed the hsx branch 4 times, most recently from 1401264 to be89a41 Compare February 29, 2020 18:55
@kimikage kimikage marked this pull request as ready for review February 29, 2020 18:56
Copy link
Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work. I'm especially impressed that you engineered the cost estimates for inlining; presumably you've discovered this already, but if not you're someone who would appreciate https://docs.julialang.org/en/latest/devdocs/inference/#The-inlining-algorithm-(inline_worthy)-1. For the remaining challenges, how much is due to me failing to finish JuliaLang/julia#30222?

src/utilities.jl Outdated
@@ -1,13 +1,28 @@
# Helper data for CIE observer functions
include("cie_data.jl")

# for optimization
if Sys.ARCH !== :i686
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really cover us? fma is really slow when implemented in software.
Perhaps we're OK though since it's a IEEE 754-2008 requirement, and 2008 was some time ago. https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation#Support
But I think that's when i686 was discontinued, I think, so I'd be surprised if this comes up very often.

Copy link
Collaborator Author

@kimikage kimikage Mar 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, people who use x86 processors which do not support FMA are probably not interested in speed. I have not done enough survey on ARM, but I think the high-end models support FMA.

Since Julia 1.5 is still ahead, it is possible to use muladd at this time.
I will also look for workarounds other than fma.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked briefly, and while I don't know anything about ARM it does seem plausible that all recent models (meaning, less than 8 years old) support it. I'm good with this as the default for now, and if we get complaints we can always add new architectures.

Copy link
Collaborator Author

@kimikage kimikage Mar 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the original purpose of this if block is not the checking whether FMA is supported or not, but #379 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I'm happy with this!

Copy link
Collaborator Author

@kimikage kimikage Mar 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative workaround below does not work as expected:

_div60(x::T) where T = muladd(x, T(1/960), x * T(0x1p-6))
if _div60(90.0f0) == 1.5f0
    div60(x::T) where T <: Union{Float32, Float64} = _div60(x)
else
    # force two-step multiplication
    div60(x::T) where T <: Union{Float32, Float64} = x * T(1/960) + x * T(0x1p-6)
end

even though, in REPL,

julia> VERSION
v"1.5.0-DEV.376"

julia> _div60(90.0f0)
1.5000001f0

julia> div60(90.0f0)
1.5f0

AFAIK, only on v1.5.0-DEV (Edit: and v1.4.0-rc2) _div60 does not work as expected. However, it is not only on v1.5.0-DEV that the if block does not work as expected.(cf. JuliaMath/FixedPointNumbers.jl#131 (comment))

Perhaps the cause is that the stages of the constant propagation and the (LLVM IR and native) code generation are different.

Copy link
Collaborator Author

@kimikage kimikage Mar 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a bug of Julia compiler that _div60 is drastically optimized. However, it seems that Julia drives that.

julia> @code_llvm _div60(90.0f0)

;  @ REPL[1]:1 within `_div60'
; Function Attrs: uwtable
define float @julia__div60_17301(float) #0 {
top:
; ┌ @ float.jl:404 within `*'
   %1 = fmul float %0, 1.562500e-02
; └
; ┌ @ float.jl:409 within `muladd'
   %2 = fmul contract float %0, 0x3F51111120000000
   %3 = fadd contract float %2, %1
; └
  ret float %3
}

julia> @code_llvm div60(90.0f0)

;  @ REPL[2]:5 within `div60'
; Function Attrs: uwtable
define float @julia_div60_17307(float) #0 {
top:
; ┌ @ float.jl:404 within `*'
   %1 = fmul float %0, 0x3F51111120000000
   %2 = fmul float %0, 1.562500e-02
; └
; ┌ @ float.jl:400 within `+'
   %3 = fadd float %1, %2
; └
  ret float %3
}

I don't know whether this is the expected behavior.

This adds the clamping and hue normalization for sources to HSx-->RGB conversions.
This also adds the clamping for destinations to HSI-->RGB conversion.
Comment on lines +6 to +12
_div60(x::T) where T = muladd(x, T(1/960), x * T(0x1p-6))
if reduce(max, _div60.((90.0f0,))) == 1.5f0
div60(x::T) where T <: Union{Float32, Float64} = _div60(x)
else
# force two-step multiplication
div60(x::T) where T <: Union{Float32, Float64} = x * T(0x1p-6) + x * T(1/960)
end
Copy link
Collaborator Author

@kimikage kimikage Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the workaround for the muladd problem with an ad-hoc measure. This works as expected "for now".
In the nightly build, the test may fail in the future. I'll think about that at that time. As long as the cause is clear, === in tests can be replaced with .

If there are no other problems, I will merge this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

@kimikage
Copy link
Collaborator Author

kimikage commented Mar 4, 2020

https://docs.julialang.org/en/latest/devdocs/inference/#The-inlining-algorithm-(inline_worthy)-1.

I am very grateful for the sample code. I think it would be nice if a macro like @code_inlining was officially provided as a variation of @code_typed. It may be a niche and not flexible, though.

BTW, I think the cost 1 of Base.not_int(::Bool) is obviously too expensive. It is usually required in conditional expressions, and in most instruction sets there are instructions or special operands for negative conditions (e.g. "not equal", "not less than"). This is the answer of the quiz.:smile:

julia> f(x, mask) = (x & mask) != zero(x);

julia> tt = Tuple{UInt8, UInt8};

julia> mi = Base.method_instances(f, tt)[1];

julia> ci = code_typed(f, tt)[1][1]
CodeInfo(
1%1 = Base.and_int(x, mask)::UInt8%2 = (%1 === 0x00)::Bool%3 = Base.not_int(%2)::Bool
└──      return %3
)

julia> opt = Core.Compiler.OptimizationState(mi, Core.Compiler.Params(typemax(UInt)));

julia> cost(stmt::Expr) = Core.Compiler.statement_cost(stmt, -1, ci, opt.sptypes, opt.slottypes, opt.params);

julia> cost(stmt) = 0;

julia> for c in ci.code; @show cost(c), c;end
(cost(c), c) = (1, :(Base.and_int(_2, _3)))
(cost(c), c) = (1, :(%1 === 0x00))
(cost(c), c) = (1, :(Base.not_int(%2)))
(cost(c), c) = (0, :(return %3))

@timholy
Copy link
Member

timholy commented Mar 4, 2020

I like the idea of a macro...maybe @code_cost?

Also, I'm glad for your input about the costs themselves. The precise numbers I picked, while within stated ranges, were pretty arbitrary. Perhaps a better strategy would be to have an architecture-dependent tuning process, but I didn't go to that kind of effort. If there's a clear case for fixing some of them we should just do that.

@kimikage kimikage merged commit 558bcd2 into JuliaGraphics:master Mar 4, 2020
@kimikage kimikage deleted the hsx branch March 4, 2020 09:08
@kimikage
Copy link
Collaborator Author

kimikage commented Mar 7, 2020

This is off-topic, but I implemented the @code_costs. There is no sense of unity in the naming. (@code_llvm is using a "noun", though.)
https://github.com/kimikage/CodeCosts.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistency in handling hues HSI->RGB conversion may return a value out of [0,1]
2 participants