Releases · JuliaGPU/CUDA.jl

09 Aug 10:51

github-actions

v3.3.5

de6fd81

v3.3.5

CUDA v3.3.5

Diff since v3.3.4

Closed issues:

Integer division error for the product of sparse times empty matrices (#962)
Bad conversion from QR to CuArray (#969)
Errors during installation test (#1004)
Be explicit about imports (#1028)
Exponentiation with constants can produce bad GPU code compared to the CPU (#1031)
rem uses wrong intrinsic (#1040)
test Cuda fails on gpuarrays\reductions/minimum maximum (#1043)
Broadcasted type conversion on literal value doesn't work (#1044)
CUDA overrides somehow screwing up customized printing? (#1055)
Is it possible to copy any data into GPU via recursive CuDeviceArray construction? (#1057)
CUDA doesn't compile after upgrade to Julia 1.6.2 (#1065)
Timing discrepancy between CUDA.@time and Benchmarktools for Flux model (#1067)
cannot convert range to Curray (#1070)
Thread safety issue with gemv! (#1072)
CuSparseMatrixCSC conversion errors (#1075)
cublasHgemmStridedBatched (#1076)
ERROR: UndefKeywordError: keyword argument elements not assigned (#1077)
Support for generating Float16 random numbers (#1081)
Illegal memory access during complex exponential with large imaginary part as exponent (#1085)
"Error: CUDA.jl does not yet support CUDA with ptxas 11.3.109" when using "JULIA_CUDA_USE_BINARYBUILDER=false" (#1089)

Merged pull requests:

Add support for unified arrays. (#1023) (@maleadt)
Look for libcuda in more places. (#1030) (@maleadt)
Detect common integer exponentiations and handle them directly. (#1033) (@maleadt)
Allow strided inputs to various library functions. (#1038) (@maleadt)
Use correct intrinsics for rem (#1041) (@simonbyrne)
update Package Manager link (#1052) (@ehgus)
Update manifest (#1054) (@github-actions[bot])
Add test for math_mode (#1056) (@kshyatt)
Streamline atomics. (#1059) (@maleadt)
Add support for device capability-dependent code. (#1060) (@maleadt)
Adapt to GPUArrays changes. (#1061) (@maleadt)
Add special constructors to work around Base AbstractQ size weirdness. (#1063) (@maleadt)
Update manifest (#1064) (@github-actions[bot])
Small allocator improvements (#1068) (@maleadt)
Latency improvements (bis) (#1069) (@maleadt)
lib: cusparse: fix #962 (#1073) (@thazhemadam)
Make handle cache thread-safe. (#1074) (@maleadt)
Bump GPUCompiler. (#1079) (@maleadt)
add support for half-precision gemm (#1080) (@bjarthur)
Extend and switch to the new CUDA RNG (#1082) (@maleadt)
cusparse: fix conversion from sparse matrix to dense matrix (#1083) (@maleadt)
Support/bump for CUDA 11.4.1 and CUDNN 8.2.2 (#1084) (@maleadt)
Use sincos from libdevice to perform illegal global load. (#1086) (@maleadt)
Bump GPUCompiler; use our own opt pipeline. (#1087) (@maleadt)
Update manifest (#1090) (@github-actions[bot])
Backports for 3.3.5 (#1091) (@maleadt)

Contributors

time, maleadt, and 5 other contributors

Assets 2

15 Jul 19:54

github-actions

v3.3.4

414ed0a

v3.3.4

CUDA v3.3.4

Diff since v3.3.3

Closed issues:

Cholesky on 1.8 doesn't dispatch correctly (#1046)

Merged pull requests:

restore lost tests (#1042) (@vchuravy)
Base.unsafe_lenght is deprecated on 1.8 (#1045) (@vchuravy)
Update manifest (#1048) (@github-actions[bot])
Fix cholesky on 1.8, fix #1046 (#1049) (@kshyatt)
Backport changes for 3.3.4 (#1050) (@vchuravy)

Assets 2

09 Jul 11:42

github-actions

v3.3.3

4c87c96

v3.3.3

CUDA v3.3.3

Diff since v3.3.2

Merged pull requests:

Adapt to LLVM changes. (#1022) (@maleadt)
Update manifest (#1029) (@github-actions[bot])
just some simple printing tests (#1032) (@kshyatt)
Test for is_capturing (#1034) (@kshyatt)
Tests for buffer printing (#1035) (@kshyatt)
Make it possible to change the pool alloc and handle types. (#1036) (@maleadt)
Backports for 3.3 (#1037) (@maleadt)

Assets 2

02 Jul 21:22

github-actions

v3.3.2

1e76295

v3.3.2

CUDA v3.3.2

Diff since v3.3.1

Closed issues:

Missing artifacts errors (#1003)
Relax restriction on types allowed in kernels? (#1005)
PPC: Atomic{Float64} is not supported (#1008)
Unexpected result in combination with Zygote.gradient() (#1019)
Both ExprTools and LLVM export "parameters"; uses of it in module CUDA must be qualified (#1025)

Merged pull requests:

Fixes for artifact loading. (#1006) (@maleadt)
dlopen CUBLAS before CUTENSOR. (#1007) (@maleadt)
Use a plain integer to keep track of pool last use time. (#1009) (@maleadt)
More fixes to artifact discovery. (#1010) (@maleadt)
add custom structs tutorial (#1011) (@jw3126)
big mapreduce performance (#1012) (@xaellison)
Fixes for Julia 1.7 (#1013) (@maleadt)
Update manifest (#1014) (@github-actions[bot])
Remove memory pools (#1015) (@maleadt)
Move refcounting to an array storage type (#1016) (@maleadt)
Remove unneeded disambiguation method. (#1017) (@maleadt)
Simplify context validity check. (#1018) (@maleadt)
Improve LazyInitialized (#1020) (@maleadt)
More allocator clean-ups (#1021) (@maleadt)
CUDA 11.4 (#1024) (@maleadt)
Only import from ExprTools what we need. (#1026) (@maleadt)
Backports release 3.3 (#1027) (@maleadt)

Assets 2

22 Jun 07:46

github-actions

v3.3.1

d14e0cb

v3.3.1

CUDA v3.3.1

Diff since v3.3.0

Closed issues:

Reclaim with stream-ordered allocator (#952)
possible hanging with CUDA.@profile? (#961)
Upgrading from v3.2.1 to v3.3.0 broke my installation (#970)
Calls to has_cudnn running on wrong CuDevice? (#978)
Test does not run on MIT Supercloud after upgrading to 3.3.0 (#980)
Performance issue with complicated loops in function (#984)
Is it possible to set cache config in CUDA.jl? (#988)
@atomic should perform type conversions (#989)
Compatible NVIDIA driver but still got compatibility warning (#1001)

Merged pull requests:

Update manifest (#971) (@github-actions[bot])
Fix disambiguation of CUDA 11.1 using CUSOLVER. (#972) (@maleadt)
Simplify initialization helper macro. (#973) (@maleadt)
Move at-typed_ccall to LLVM.jl. (#976) (@maleadt)
Replace workspace macro with function (#981) (@maleadt)
Implement and improve reclaim for the stream-ordered allocator (#983) (@maleadt)
Bump GPUCompiler to fix WMMA test issue. (#985) (@maleadt)
Rework memoization (#986) (@maleadt)
Fixes for CUBLAS/CUDNN logging (#987) (@maleadt)
Perform type conversions in at-atomic. (#990) (@maleadt)
Don't initialize the API when setting log callbacks. (#992) (@maleadt)
Create a helper for lazy, thread-safe initialization. (#993) (@maleadt)
Optimize library handles (#996) (@maleadt)
Optimize PerDevice for abstract element types. (#997) (@maleadt)
Update manifest (#999) (@github-actions[bot])
Replace PerDevice with context-keyed dictionaries. (#1000) (@maleadt)
Improve launch latency (#1002) (@maleadt)

Assets 2

11 Jun 16:12

github-actions

v3.3.0

5243ffb

v3.3.0

CUDA v3.3.0

Diff since v3.2.1

Closed issues:

PTX code missing DWARF debug information (#72)
Suggestion - Disable AbstractArray indexing fallback by default (#178)
Support isbits Union Arrays (#103)
Missing norm(x, p) kernel (#84)
CUDA enhanced compatibility (#832)
Support for CuSparseMatrixCSC{Float16} x CuVector{Float16} (#849)
CuArray to zeroth power returns Matrix (#897)
Fatal errors during sorting tests (#916)
Error when computing reductions into a view with reduce_blocks > 1 (#919)
CUDA FFT plan application runs Out of Memory in Pluto (#926)
has_cuda() errors in CPU-only environments on master (#928)
Race condition when computing mean! of large arrays? (#929)
Supporting union bits types (#934)
test failing in device/intrinsics (#942)
Memory allocation fails for multi-GPU (#943)
Scalar operations when using output of cu(::OffsetArray) (#954)
Quicksort kernel does not cope with reduced threads (#955)
CUDA.jl cannot find installed CUPTI libraries with local installation on linux (#956)
Error for complex sparse-dense Matrix-vector multiplication (#958)
"using CUDA" gives error in type inference of Ref{Bool} (#965)

Merged pull requests:

Override outlined throw functions. (#874) (@maleadt)
Enable location and debug info. (#891) (@maleadt)
Compile using the toolkit, not the driver. (#892) (@maleadt)
Rework timings (#898) (@maleadt)
Fix #849, allow CUSPARSE to use F16 (#904) (@kshyatt)
Add Windows CI. (#907) (@maleadt)
Split test for better parallelization. (#908) (@maleadt)
Update manifest (#909) (@github-actions[bot])
Improve package latency. (#910) (@maleadt)
Just some missing tests for CUBLAS (#911) (@kshyatt)
Fix bug and add tests for iamax/iamin (#913) (@kshyatt)
Fix profiler initialization and exception handling. (#914) (@maleadt)
Add a show method for devices(). (#915) (@maleadt)
Fix update of CUFFT handle. (#921) (@maleadt)
Update manifest (#922) (@github-actions[bot])
Reinstate compatibility with Kepler GPUs. (#923) (@maleadt)
Use multiple GPUs on CI when available. (#924) (@maleadt)
Fix two-step mapreduce with wrapped output. (#925) (@maleadt)
Eagerly free the CUFFT workspace when generating a new one. (#927) (@maleadt)
Fix CUDA.function without throwing. (#930) (@maleadt)
Fix the REPL synchronization hook. (#931) (@maleadt)
Re-initialize the random seed every time. (#932) (@maleadt)
Protect against race in iterating compute processes. (#933) (@maleadt)
Helper function to get the device given a cu ptr. (#935) (@akashkgarg)
Implement CUDA's Enhanced Compatibility when selecting a toolkit. (#936) (@maleadt)
Update manifest (#939) (@github-actions[bot])
Re-introduce specialization of cufunction. (#940) (@maleadt)
Support isbits union element types with CuArray. (#941) (@maleadt)
Try generating code with unreachable control flow. (#944) (@maleadt)
Upgrade to CUDA 11.3 Update 1. (#945) (@maleadt)
Always use exit instead of trap. (#947) (@maleadt)
Select devices without NVML. (#948) (@maleadt)
Fixes for Julia 1.7. (#949) (@maleadt)
Query the CUBLAS version without requiring a handle. (#951) (@maleadt)
Improve CUBLAS and CUDNN logging. (#953) (@maleadt)
Update manifest (#957) (@github-actions[bot])
Enable sorting with reduced block sizes (#959) (@xaellison)
Adapt to GPUCompiler changes, bump GPUArrays. (#963) (@maleadt)
Adapt to change in allowscalar. (#964) (@maleadt)
Don't disable the CUDNN log callback on Windows. (#966) (@maleadt)
Use released dependencies. (#968) (@maleadt)

Assets 2

13 May 16:10

github-actions

v3.2.1

92f1001

v3.2.1

CUDA v3.2.1

Diff since v3.2.0

Closed issues:

adding constant to an array: performance regression compared to CUDAdrv (#838)
CUDA.abs() on vector input: performance regression compared to CUDAdrv (#839)
CUDA.@sync seems to be using a lot of CPU while waiting (#893)
Memory leaks with repeated use of fft of a CUDA Array (#894)
CUDA.jl v3.2 seems to download wrong version of CUDNN and CUTENSOR (#899)

Merged pull requests:

Rework synchronization: first spin, then yield, and finally block. (#896) (@maleadt)
Make cusolvermg really optional. (#900) (@maleadt)
Rebuild artifacts. (#901) (@maleadt)
Take back control over the CUFFT work area. (#902) (@maleadt)

Assets 2

10 May 13:56

github-actions

v3.2.0

1e17f68

v3.2.0

CUDA v3.2.0

Diff since v3.1.0

Closed issues:

Explore CUDA graph API (#65)
Runtime functions are missing debug information (#53)
Native RNGs do not pass SmallCrush (#803)
Remaining threads/FFT/mult-gpu error (#876)

Merged pull requests:

Add wrappers for the CUDA graph API. (#877) (@maleadt)
Use the profiler API to start capture. (#878) (@maleadt)
Duplicate RNG state across block to avoid need for synchronization (#879) (@maleadt)
Support for printing tuples. (#880) (@maleadt)
Support unsigned inputs to integer intrinsics. (#881) (@maleadt)
Switch to Philox2x32 for device-side RNG (#882) (@maleadt)
Update manifest (#884) (@github-actions[bot])
Treat CartesianIndices in views as scalars. (#886) (@maleadt)
Robustly get variables from the environment during init. (#887) (@maleadt)
Move Statistics functionality to GPUArrays. (#888) (@maleadt)
Update artifacts and use sources from unified JLLs. (#889) (@maleadt)
Lazy initialization of CUDNN and CUTENSOR (#890) (@maleadt)
Update manifest (#895) (@github-actions[bot])

Assets 2

28 Apr 16:34

github-actions

v3.1.0

4549cb7

v3.1.0

CUDA v3.1.0

Diff since v3.0.3

Closed issues:

GPU Implementation of partialsort! (#93)
Document associativity requirements of scan/reduce operators (#819)
Problem in reduce_block? (#843)
CUDNN convolution incorrect for small images (#848)
Newly-spawned tasks should re-set the device (#851)
sort!(CUDA.zeros(2^25)) throws invalid configuration argument (code 9, cudaErrorInvalidConfiguration) (#852)
Type-preserving upload about cu in doc may be wrong (#855)
Memory corruption / segfault with Threads.@async and planned FFTs (#859)
Don't call nvmlErrorString (during init?) to prevent crashes on WSL (#860)
unsafe_copy3d! does not work with stream-ordered allocations (#863)
CUDA3 seems to have memory leak (#866)

Merged pull requests:

Implement statistics functions: correlation and covariance (#509) (@berquist)
@atomic support * and / (#842) (@yuehhua)
CUDNN docstring revisions. (#844) (@GunnarFarneback)
Sorting perf (again) (#845) (@xaellison)
Update manifest (#846) (@github-actions[bot])
Remove extraneous apostrophe (#847) (@kshyatt)
reduce_block fixes. (#853) (@maleadt)
Fix sorting large arrays. (#854) (@maleadt)
Remove unsupported config launch keyword. (#856) (@maleadt)
Identify the buffer during unsafe_wrap to support unified free. (#857) (@maleadt)
Add support for CUDA 11.3. (#858) (@maleadt)
Work around buggy NVML initialization on WSL (#861) (@maleadt)
ae/partialsort (#864) (@xaellison)
Update manifest (#865) (@github-actions[bot])
Improve multitasking with CUFFT. (#867) (@maleadt)
Introduce a HandleCache type. (#868) (@maleadt)
Improve multitasking with CURAND (#869) (@maleadt)
Document associativity requirement of accumulate (#870) (@HenriDeh)
Half-Precision Intrinsics (#871) (@iyaja)
Work around offset calculation bug in cuMemcpy3DAsync. (#872) (@maleadt)
fix #848: CUDNN convolution incorrect for small images (#873) (@denizyuret)

Assets 2

15 Apr 14:23

github-actions

v3.0.3

632f960

v3.0.3

CUDA v3.0.3

Diff since v3.0.2

Closed issues:

CUDA.jl init error in the REPL without using a CUDA feature (#841)

Merged pull requests:

Only synchronize the REPL when CUDA is configured. (#840) (@maleadt)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA v3.3.5

Contributors

CUDA v3.3.4

CUDA v3.3.3

CUDA v3.3.2

CUDA v3.3.1

CUDA v3.3.0

CUDA v3.2.1

CUDA v3.2.0

CUDA v3.1.0

CUDA v3.0.3

Releases: JuliaGPU/CUDA.jl

v3.3.5

CUDA v3.3.5

Contributors

v3.3.4

CUDA v3.3.4

v3.3.3

CUDA v3.3.3

v3.3.2

CUDA v3.3.2

v3.3.1

CUDA v3.3.1

v3.3.0

CUDA v3.3.0

v3.2.1

CUDA v3.2.1

v3.2.0

CUDA v3.2.0

v3.1.0

CUDA v3.1.0

v3.0.3

CUDA v3.0.3