Skip to content

Releases: JuliaGPU/CUDA.jl

v3.6.4

09 Jan 00:27
9c77666
Compare
Choose a tag to compare

CUDA v3.6.4

Diff since v3.6.3

Closed issues:

  • Artifacts.toml has bad git-tree-sha1 values (#1309)

Merged pull requests:

v3.6.3

07 Jan 07:25
Compare
Choose a tag to compare

CUDA v3.6.3

Diff since v3.6.2

Closed issues:

  • CUDA.@atomic deadlocks when overwriting NaN (#1299)
  • Unreasonablely slow copy kernel (#1301)
  • Passing a LogicalIndex(::CuArray) fails (#1304)

Merged pull requests:

  • Allow sorting of tuples of numbers (#1196) (@mcabbott)
  • Use === for generic atomic updates with compare-and-swap (#1300) (@guyvdbroeck)
  • Update manifest (#1302) (@github-actions[bot])
  • Store the array length next to its dimensions. (#1303) (@maleadt)
  • Disallow calling CUDA device array intrinsics on the host. (#1305) (@maleadt)
  • Support logical indexing with CPU sources. (#1306) (@maleadt)
  • Activate a context when calling device!. (#1307) (@maleadt)

v3.6.2

29 Dec 09:13
fff6087
Compare
Choose a tag to compare

CUDA v3.6.2

Diff since v3.6.1

Closed issues:

  • Norm of complex-typed CuArray is not real (#1290)
  • Calling @show on Symmetric of a CuArray triggers Scalar Indexing (#1294)
  • CUSPARSE Error when solving a linear system (#1296)

Merged pull requests:

  • Correctly handle missing cached_memory. (#1295) (@maleadt)
  • Update manifest (#1297) (@github-actions[bot])

v3.6.1

23 Dec 11:56
7778e34
Compare
Choose a tag to compare

CUDA v3.6.1

Diff since v3.6.0

Closed issues:

  • reduce_block error on Complex type (#1289)
  • cudnn_cnn_infer64_8 could not be laoded (#1291)
  • Support to find the first k eigenvalues of a sparse matrix (#1292)

Merged pull requests:

v3.6.0

22 Dec 07:02
2a6bfa6
Compare
Choose a tag to compare

CUDA v3.6.0

Diff since v3.5.0

Closed issues:

  • Conversion issue (#157)
  • Extend new RNG to Complex numbers & normal distributions (#726)
  • Fatal errors during sorting tests (#916)
  • deepcopy failing (#1202)
  • Kernel compilation fails when specifying shared memory array size as a tuple consisting of block dimension and kernel argument (#1205)
  • ERROR: LoadError: The artifact at C:\Users\name.julia\artifacts\58bd87695e9ccdb508cb38be1ab717315ecc9152 is empty. (#1209)
  • InvalidIRError when displaying a model which is on the GPU (#1212)
  • CUDA.jl tries to load CUDA compat loaded via jll even though system package is installed (#1216)
  • Synchronizing over blocks (#1220)
  • assignment changes random seed (#1226)
  • accumulate gives wrong answer when init != 0 (#1227)
  • Generic dot kernel: use multiple kernels instead of atomics (#1244)
  • integer division error creating CuVector of missing and nothing (#1251)
  • unsupported dynamic function invocation with union type of more than 2 elements (#1252)
  • three CUDA.@atomic in a row result in out-of-bounds error (#1254)
  • Float16 CAS cannot use atom.cas.b16.global on sm_61 (#1258)
  • cu(::SVector) gives SVector, cu(::MVector) gives CuArray (#1262)
  • Get back unsafe_copyto!methods for unified<-unified and unified<->device (#1263)
  • Passing and using a FFT plan in a CUDA kernel seems impossible (#1266)
  • Inplace Complex FFT and Threads (#1268)
  • sort returns nothing (#1270)
  • Release a new version (#1276)
  • __init_driver__ not called in 3.5 (#1280)
  • Shared memory does not support isbits unions. (#1281)
  • NVIDIA Nsight Systems and CUDA.@profile error (#1282)
  • nvprof with using CUDA crashes julia (#1283)

Merged pull requests:

v3.5.0

11 Oct 06:03
Compare
Choose a tag to compare

CUDA v3.5.0

Diff since v3.4.2

Closed issues:

  • Illegal memory access on 3.3 (#975)
  • Forward compatibility (#1071)
  • ambiguous sparse constructor (#1088)
  • Map reduce with float 16 (#1124)
  • Allow invalid GPU pointers not allowed in unsafe_wrap (#1125)
  • Scalar Indexing error in the Introduction docs (#1127)
  • stackoverflow when printing a custom subtype of AbstractCuSparseMatrix (#1128)
  • missing rand methods (#1138)
  • Error mapreducing over a 0 dimensional array (#1141)
  • seed! is not thread safe (#1158)
  • Simplify Int32-based indices (#1160)
  • Concatenating a scalar to a CuArray gives an Array (#1162)
  • Calling byte_perm with Int32 values inserts sign checks (#1165)
  • sum! does not compile for large arrays (#1169)
  • Same random sequence on GPU and CPU? (#1170)
  • Specifying eltype and buffer type when adapting to CuArray? (#1171)
  • Inefficient lop3.lut instructions generated (#1172)
  • Writing temporary PTX files can fail (#1173)
  • Switching devices doesn't switch the REPL's output task (#1175)
  • GC is not working for CuSparseMatrixCSR (#1178)
  • sparse*dense operations shouldn't drop sparseness (#1188)
  • Raises illegal memory access error randomly (#1189)

Merged pull requests:

v3.4.2

27 Aug 20:25
be43480
Compare
Choose a tag to compare

CUDA v3.4.2

Diff since v3.4.1

Closed issues:

  • Broadcasting a datatype does not work (#261)
  • CUDA error: invalid argument during Zygote/Flux gradient computation (#1107)
  • EXCEPTION_ACCESS_VIOLATION when using shared memory allocations. (#1116)

Merged pull requests:

  • add symmetric support for mul (#217) (@Roger-luo)
  • adds a device array type for CuSparseMatrixCSR to support using it in kernel functions (#1106) (@Roger-luo)
  • Update manifest (#1108) (@github-actions[bot])
  • Specialize Ref{<:Type} for GPU compatibility. (#1109) (@maleadt)
  • Use the documented version of the enable_finalizers API. (#1111) (@maleadt)
  • Don't embed the method table in the AST. (#1112) (@maleadt)
  • Remove the hacky unique'ing of shmem GVs. (#1114) (@maleadt)
  • Introduce a macro for marking multiple functions as device-only. (#1117) (@maleadt)
  • Simplify library loading. (#1121) (@maleadt)
  • Backports for 3.4.2 (#1122) (@maleadt)

v3.4.1

17 Aug 15:02
c3ce593
Compare
Choose a tag to compare

CUDA v3.4.1

Diff since v3.4.0

Closed issues:

  • cudnnFindConvolutionAlgorithmWorkspaceSize uses removed function cached_memory (#1101)

Merged pull requests:

v3.4.0

13 Aug 19:05
6758fca
Compare
Choose a tag to compare

CUDA v3.4.0

Diff since v3.3.6

Merged pull requests:

v3.3.6

13 Aug 15:37
964893c
Compare
Choose a tag to compare

CUDA v3.3.6

Diff since v3.3.5

Closed issues:

  • LinearAlgebra.mul! with scalar arguments triggers scalar iteration (#790)
  • Kernel fails if input is struct with function (#1094)
  • cusparse: sparse matrix - matrix multiplication broken with transpose operation (#1095)

Merged pull requests: