Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add image rotation #553

Merged
merged 32 commits into from
Jan 29, 2024
Merged

Add image rotation #553

merged 32 commits into from
Jan 29, 2024

Conversation

roflmaostc
Copy link
Contributor

@roflmaostc roflmaostc commented Dec 17, 2023

Hi,

based on DiffImageRotation.jl and as discussed in #552 I tried to add image rotation into NNlib.jl.

Some comments:
I always assumed a 4D array. In principle, the results match the ones from ImageTransformations.jl closely. But, the boundary handling is different and sometimes rounding artifacts occur.
I compared this a little but I think there is space for interpretation how to do it. There is some code left, to change the boundary handling at the cost of ~20% performance.

PR Checklist

  • Tests are added
  • Documentation -> comes later

Benchmarks

@maxfreu mentioned something about different parallelization schemes for CPU vs GPU.

Benchmarking between my implementation and the parallelized elementwise application shows that the timings are roughly equal. So not sure if we should make the code much more verbose and complicated? But please share your experience with me! Especially, if we can squeeze out more performance out of CUDA, that would be great! But on CPU we already beat existing code:

Some benchmarking:

julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 11.4
NVIDIA driver 470.223.2

CUDA libraries: 
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+470.223.2

Julia packages: 
- CUDA: 5.1.1
- CUDA_Driver_jll: 0.7.0+0
- CUDA_Runtime_jll: 0.10.1+0

Toolchain:
- Julia: 1.9.4
- LLVM: 14.0.6

1 device:
  0: NVIDIA GeForce RTX 3060 (sm_86, 11.369 GiB / 11.749 GiB available)


julia> versioninfo()
Julia Version 1.9.4
Commit 8e5136fa297 (2023-11-14 08:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × AMD Ryzen 5 5600X 6-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 12 on 12 virtual cores
Environment:
  JULIA_NUM_THREADS = 12

Comparison to ImageTransformations:

# multithreaded
julia> arr_2D = randn((2000, 3000));

julia> @btime NNlib.imrotate(reshape($arr_2D, 2000, 3000, 1, 1), deg2rad(13));
  24.449 ms (153 allocations: 45.79 MiB)

julia> @btime ImageTransformations.imrotate($arr_2D, deg2rad(13));
  81.811 ms (3 allocations: 67.55 MiB)

# julia -t 1
julia> @btime NNlib.imrotate(reshape($arr_2D, 2000, 3000, 1, 1), deg2rad(13));
  45.397 ms (29 allocations: 45.78 MiB)

julia> @btime ImageTransformations.imrotate($arr_2D, deg2rad(13));
  95.694 ms (3 allocations: 67.55 MiB)
julia> using CUDA, NNLib, BenchmarkTools, Tullio

julia> arr = randn(Float32, (256, 256, 3, 32));

julia> arrc = CuArray(arr);

julia> @btime NNlib.imrotate($arr, deg2rad(37));
  8.020 ms (147 allocations: 24.01 MiB)

julia> @btime @tullio out := sin.($arr[i, j, c, b]);
  8.358 ms (196 allocations: 10.77 KiB)

julia> @btime CUDA.@sync NNlib.imrotate($arrc, deg2rad(37));
  510.106 μs (99 allocations: 5.39 KiB)

julia> @btime CUDA.@sync sin.($arrc);
  291.288 μs (37 allocations: 2.36 KiB)

Would be great if someone reviews this and we can merge it! Not urgent for me, since DiffImageRotation.jl is registered and I anyway use this currently.

@ToucheSir
Copy link
Member

Thanks! I'll leave it to @maxfreu to comment on the algorithmic piece, but the biggest outstanding addition here would be making the test suite device-agnostic so it can be tested on GPU CI too. Have a look at the upsample tests for how this is done if you haven't already.

@roflmaostc
Copy link
Contributor Author

ImageTransformations.jl doesn't work on GPUs but we can probably wrap it in Arrays.

But yeah, I see and can fix it!

Copy link
Contributor

@maxfreu maxfreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I reviewed the code and made some remarks. Maybe someone else should take a look, too. One last point: Please check the output of @code_warntype for the different functions you wrote and make sure everything is blue (:

src/rotation.jl Outdated Show resolved Hide resolved
src/rotation.jl Outdated Show resolved Hide resolved
src/rotation.jl Outdated Show resolved Hide resolved
src/rotation.jl Outdated Show resolved Hide resolved
src/rotation.jl Outdated Show resolved Hide resolved
src/rotation.jl Outdated Show resolved Hide resolved
src/rotation.jl Outdated Show resolved Hide resolved
src/rotation.jl Outdated Show resolved Hide resolved
src/rotation.jl Outdated Show resolved Hide resolved
test/rotation.jl Outdated Show resolved Hide resolved
@roflmaostc
Copy link
Contributor Author

Thanks for the comments.
I plan to incorporate it in a couple of weeks.
But for Julia 1.10 we need to wait for this PR to update ImageTransformations.jl because otherwise some dependencies were held back at non 1.10 compatible versions.

@roflmaostc
Copy link
Contributor Author

There is an issue with FillArrays.jl which I don't know how to solve. It seems to fail catch the backend with FillArrays.
FillArrays is not an explicit dependency otherwise I would do a dispatch in case of that.

Test gradients: Error During Test at /home/fxw/.julia/dev/NNlib.jl/test/rotation.jl:78
  Got exception outside of a @test
  StackOverflowError:
  Stacktrace:
   [1] get_backend(A::FillArrays.Fill{Float32, 4, NTuple{4, Base.OneTo{Int64}}}) (repeats 79984 times)
     @ KernelAbstractions ~/.julia/packages/KernelAbstractions/GCOhX/src/KernelAbstractions.jl:466

src/rotation.jl Outdated Show resolved Hide resolved
@ToucheSir
Copy link
Member

The StackOverflow is JuliaGPU/KernelAbstractions.jl#366 manifesting again. https://github.com/JuliaGPU/KernelAbstractions.jl/blob/96b89f994e08b48b0a3142d7db4b1c97be054b41/src/KernelAbstractions.jl#L466 should probably check if parent(A) === A before recursing. I would recommending bumping that issue or opening a new one.

@roflmaostc
Copy link
Contributor Author

But would KA know that the desired array is a CuArray and not an Array? FillArrays don't have hardware backends, right? So I guess we would need to fill a real array with the FillArray?

src/rotation.jl Outdated Show resolved Hide resolved
@roflmaostc
Copy link
Contributor Author

Thanks, I fixed it and added documentation!

@roflmaostc
Copy link
Contributor Author

Some GPU test fail but also in parts I didn't touch. Maybe I should relax the .\approx tests with a relative error?

@roflmaostc
Copy link
Contributor Author

Another question regardin KA.jl: should we include a KernelAbstractions.synchronize(backend)?

src/rotation.jl Outdated Show resolved Hide resolved
test/rotation.jl Outdated Show resolved Hide resolved
@roflmaostc
Copy link
Contributor Author

I think the only open point is the slight disagreement for nearest neighbor for some specific angles to ImageTransformations.
But I believe those are rounding issues and don't change the results significantly. At least, I don't see what would be wrong here.

In fact, ImageTransformations is also not tested for that many angles to some reference implementation.

@maxfreu
Copy link
Contributor

maxfreu commented Jan 20, 2024

Ok, now the tests are working - except for one gradient test. That should be investigated and if it's just about tolerance, that can be relaxed a bit. But now somehow 300+ other tests fail. Is this related to your changes or the latest merge by @ToucheSir ?

@roflmaostc
Copy link
Contributor Author

I can check later or tomorrow for the failing gradient test.

@ToucheSir
Copy link
Member

It may also be a regression caused by changes in a different package like Zygote or ChainRules(Core). The error looks like some method ambiguity issues have shown up again...

@roflmaostc
Copy link
Contributor Author

I lowered the accuracy for the gradtest. But locally it was running fine with 1e-6 too.

@maxfreu
Copy link
Contributor

maxfreu commented Jan 21, 2024

I just checked the test logs more carefully and indeed, the test failure is unrelated to your changes and a victim of the same error that broke the other tests. 1e-6 should be ok, sorry for the noise.

@ToucheSir ToucheSir closed this Jan 24, 2024
@ToucheSir ToucheSir reopened this Jan 24, 2024
ToucheSir added a commit that referenced this pull request Jan 24, 2024
@ToucheSir ToucheSir mentioned this pull request Jan 24, 2024
2 tasks
@ToucheSir
Copy link
Member

I see, JuliaDiff/ChainRules.jl#762 is causing Zygote to be significantly downgraded and breaking. Normally I would be comfortable ignoring that if none of the functionality this PR adds is affected, but there do seem to be tests that downgrade causes errors in that obscure the actual error. For now, we may have to park this until the ChainRules PR is merged.

src/rotation.jl Outdated Show resolved Hide resolved
@roflmaostc
Copy link
Contributor Author

I think ChainRules got merged

@maxfreu
Copy link
Contributor

maxfreu commented Jan 29, 2024

I think this PR is finally ready 🍕

@roflmaostc
Copy link
Contributor Author

The CI still was complaining, wasn't it?

Copy link
Member

@ToucheSir ToucheSir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@ToucheSir ToucheSir merged commit b61b0b8 into FluxML:master Jan 29, 2024
1 check was pending
@ToucheSir
Copy link
Member

CI failures are known and unrelated. Thanks @roflmaostc for the great work!

@roflmaostc
Copy link
Contributor Author

Thanks for your work!

Does NNlib.jl actually have more methods with preallocated buffers?
In DiffImageRotation.jl I had this special function, to save some memory allocation.

This helps if you rotate a lot of images and want to add them into the same buffer each.

@ToucheSir
Copy link
Member

Yes, see functions like conv! or scatter!. Traditionally most of those have not been differentiable, but now many have Enzyme AD rules and I think that's the best path going forwards for differentiable operations in NNlib.

@roflmaostc
Copy link
Contributor Author

Ok, looks good. I might tackle this in future then!

@maxfreu
Copy link
Contributor

maxfreu commented Jan 29, 2024

Thanks for your work also from my side! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants