Skip to content

Releases: JuliaGPU/CUDA.jl

v3.3.5

09 Aug 10:51
de6fd81
Compare
Choose a tag to compare

CUDA v3.3.5

Diff since v3.3.4

Closed issues:

  • Integer division error for the product of sparse times empty matrices (#962)
  • Bad conversion from QR to CuArray (#969)
  • Errors during installation test (#1004)
  • Be explicit about imports (#1028)
  • Exponentiation with constants can produce bad GPU code compared to the CPU (#1031)
  • rem uses wrong intrinsic (#1040)
  • test Cuda fails on gpuarrays\reductions/minimum maximum (#1043)
  • Broadcasted type conversion on literal value doesn't work (#1044)
  • CUDA overrides somehow screwing up customized printing? (#1055)
  • Is it possible to copy any data into GPU via recursive CuDeviceArray construction? (#1057)
  • CUDA doesn't compile after upgrade to Julia 1.6.2 (#1065)
  • Timing discrepancy between CUDA.@time and Benchmarktools for Flux model (#1067)
  • cannot convert range to Curray (#1070)
  • Thread safety issue with gemv! (#1072)
  • CuSparseMatrixCSC conversion errors (#1075)
  • cublasHgemmStridedBatched (#1076)
  • ERROR: UndefKeywordError: keyword argument elements not assigned (#1077)
  • Support for generating Float16 random numbers (#1081)
  • Illegal memory access during complex exponential with large imaginary part as exponent (#1085)
  • "Error: CUDA.jl does not yet support CUDA with ptxas 11.3.109" when using "JULIA_CUDA_USE_BINARYBUILDER=false" (#1089)

Merged pull requests:

v3.3.4

15 Jul 19:54
414ed0a
Compare
Choose a tag to compare

CUDA v3.3.4

Diff since v3.3.3

Closed issues:

  • Cholesky on 1.8 doesn't dispatch correctly (#1046)

Merged pull requests:

v3.3.3

09 Jul 11:42
4c87c96
Compare
Choose a tag to compare

CUDA v3.3.3

Diff since v3.3.2

Merged pull requests:

v3.3.2

02 Jul 21:22
1e76295
Compare
Choose a tag to compare

CUDA v3.3.2

Diff since v3.3.1

Closed issues:

  • Missing artifacts errors (#1003)
  • Relax restriction on types allowed in kernels? (#1005)
  • PPC: Atomic{Float64} is not supported (#1008)
  • Unexpected result in combination with Zygote.gradient() (#1019)
  • Both ExprTools and LLVM export "parameters"; uses of it in module CUDA must be qualified (#1025)

Merged pull requests:

v3.3.1

22 Jun 07:46
d14e0cb
Compare
Choose a tag to compare

CUDA v3.3.1

Diff since v3.3.0

Closed issues:

  • Reclaim with stream-ordered allocator (#952)
  • possible hanging with CUDA.@profile? (#961)
  • Upgrading from v3.2.1 to v3.3.0 broke my installation (#970)
  • Calls to has_cudnn running on wrong CuDevice? (#978)
  • Test does not run on MIT Supercloud after upgrading to 3.3.0 (#980)
  • Performance issue with complicated loops in function (#984)
  • Is it possible to set cache config in CUDA.jl? (#988)
  • @atomic should perform type conversions (#989)
  • Compatible NVIDIA driver but still got compatibility warning (#1001)

Merged pull requests:

  • Update manifest (#971) (@github-actions[bot])
  • Fix disambiguation of CUDA 11.1 using CUSOLVER. (#972) (@maleadt)
  • Simplify initialization helper macro. (#973) (@maleadt)
  • Move at-typed_ccall to LLVM.jl. (#976) (@maleadt)
  • Replace workspace macro with function (#981) (@maleadt)
  • Implement and improve reclaim for the stream-ordered allocator (#983) (@maleadt)
  • Bump GPUCompiler to fix WMMA test issue. (#985) (@maleadt)
  • Rework memoization (#986) (@maleadt)
  • Fixes for CUBLAS/CUDNN logging (#987) (@maleadt)
  • Perform type conversions in at-atomic. (#990) (@maleadt)
  • Don't initialize the API when setting log callbacks. (#992) (@maleadt)
  • Create a helper for lazy, thread-safe initialization. (#993) (@maleadt)
  • Optimize library handles (#996) (@maleadt)
  • Optimize PerDevice for abstract element types. (#997) (@maleadt)
  • Update manifest (#999) (@github-actions[bot])
  • Replace PerDevice with context-keyed dictionaries. (#1000) (@maleadt)
  • Improve launch latency (#1002) (@maleadt)

v3.3.0

11 Jun 16:12
5243ffb
Compare
Choose a tag to compare

CUDA v3.3.0

Diff since v3.2.1

Closed issues:

  • PTX code missing DWARF debug information (#72)
  • Suggestion - Disable AbstractArray indexing fallback by default (#178)
  • Support isbits Union Arrays (#103)
  • Missing norm(x, p) kernel (#84)
  • CUDA enhanced compatibility (#832)
  • Support for CuSparseMatrixCSC{Float16} x CuVector{Float16} (#849)
  • CuArray to zeroth power returns Matrix (#897)
  • Fatal errors during sorting tests (#916)
  • Error when computing reductions into a view with reduce_blocks > 1 (#919)
  • CUDA FFT plan application runs Out of Memory in Pluto (#926)
  • has_cuda() errors in CPU-only environments on master (#928)
  • Race condition when computing mean! of large arrays? (#929)
  • Supporting union bits types (#934)
  • test failing in device/intrinsics (#942)
  • Memory allocation fails for multi-GPU (#943)
  • Scalar operations when using output of cu(::OffsetArray) (#954)
  • Quicksort kernel does not cope with reduced threads (#955)
  • CUDA.jl cannot find installed CUPTI libraries with local installation on linux (#956)
  • Error for complex sparse-dense Matrix-vector multiplication (#958)
  • "using CUDA" gives error in type inference of Ref{Bool} (#965)

Merged pull requests:

v3.2.1

13 May 16:10
Compare
Choose a tag to compare

CUDA v3.2.1

Diff since v3.2.0

Closed issues:

  • adding constant to an array: performance regression compared to CUDAdrv (#838)
  • CUDA.abs() on vector input: performance regression compared to CUDAdrv (#839)
  • CUDA.@sync seems to be using a lot of CPU while waiting (#893)
  • Memory leaks with repeated use of fft of a CUDA Array (#894)
  • CUDA.jl v3.2 seems to download wrong version of CUDNN and CUTENSOR (#899)

Merged pull requests:

v3.2.0

10 May 13:56
Compare
Choose a tag to compare

CUDA v3.2.0

Diff since v3.1.0

Closed issues:

  • Explore CUDA graph API (#65)
  • Runtime functions are missing debug information (#53)
  • Native RNGs do not pass SmallCrush (#803)
  • Remaining threads/FFT/mult-gpu error (#876)

Merged pull requests:

  • Add wrappers for the CUDA graph API. (#877) (@maleadt)
  • Use the profiler API to start capture. (#878) (@maleadt)
  • Duplicate RNG state across block to avoid need for synchronization (#879) (@maleadt)
  • Support for printing tuples. (#880) (@maleadt)
  • Support unsigned inputs to integer intrinsics. (#881) (@maleadt)
  • Switch to Philox2x32 for device-side RNG (#882) (@maleadt)
  • Update manifest (#884) (@github-actions[bot])
  • Treat CartesianIndices in views as scalars. (#886) (@maleadt)
  • Robustly get variables from the environment during init. (#887) (@maleadt)
  • Move Statistics functionality to GPUArrays. (#888) (@maleadt)
  • Update artifacts and use sources from unified JLLs. (#889) (@maleadt)
  • Lazy initialization of CUDNN and CUTENSOR (#890) (@maleadt)
  • Update manifest (#895) (@github-actions[bot])

v3.1.0

28 Apr 16:34
4549cb7
Compare
Choose a tag to compare

CUDA v3.1.0

Diff since v3.0.3

Closed issues:

  • GPU Implementation of partialsort! (#93)
  • Document associativity requirements of scan/reduce operators (#819)
  • Problem in reduce_block? (#843)
  • CUDNN convolution incorrect for small images (#848)
  • Newly-spawned tasks should re-set the device (#851)
  • sort!(CUDA.zeros(2^25)) throws invalid configuration argument (code 9, cudaErrorInvalidConfiguration) (#852)
  • Type-preserving upload about cu in doc may be wrong (#855)
  • Memory corruption / segfault with Threads.@async and planned FFTs (#859)
  • Don't call nvmlErrorString (during init?) to prevent crashes on WSL (#860)
  • unsafe_copy3d! does not work with stream-ordered allocations (#863)
  • CUDA3 seems to have memory leak (#866)

Merged pull requests:

v3.0.3

15 Apr 14:23
632f960
Compare
Choose a tag to compare

CUDA v3.0.3

Diff since v3.0.2

Closed issues:

  • CUDA.jl init error in the REPL without using a CUDA feature (#841)

Merged pull requests:

  • Only synchronize the REPL when CUDA is configured. (#840) (@maleadt)