Releases · JuliaGPU/CUDA.jl

26 Jun 20:29

github-actions

v4.4.0

315c80e

v4.4.0

CUDA v4.4.0

Diff since v4.3.2

Closed issues:

Unreachable control flow leads to illegal divergent barriers (#1746)
CUBLAS fails on new CUDA.jl v4 (#1852)
Sort fails on Lovelace (sm8.9) GPUs (#1874)
gesvd! crashes on Pascal and v12.0 (#1932)
No effect for calling "nsys launch" (#1938)
Basic math operations with nested adjoint and transpose (#1940)
CPU and GPU implementations return results at dissimilar scales, even in double precision arithmetics (#1950)
Failed CUDA.jl initialization breaks Flux? (#1952)
Recent mul! changes break multiplication with matrices that have StaticArray elements (#1953)
Test infrastructure: define test groups (#1961)
Strange rand errors when sampling large matrices (#1963)
Add aqua tests (#1964)
Support of Orin GPU from Nvidia ? (#1966)
Crash in LLVM (#1971)
Warning cuDNN Convolution (#1972)
Strange behaviour when installed at system level (#1973)

Merged pull requests:

Update benchmarks for 1.8 and 1.9 (#1933) (@maleadt)
CUSOLVER: Explicitly pass NULL when not requesting svd outputs. (#1934) (@maleadt)
Detect and complain about loading system libraries. (#1935) (@maleadt)
Update manifest (#1936) (@github-actions[bot])
Avoid stack overflow with eary OOM reporting. (#1937) (@maleadt)
[CUSPARSE] Improved support for UniformScaling ad Diagonal (#1941) (@albertomercurio)
Update manifest (#1949) (@github-actions[bot])
Update GPUCompiler to fix unreachable control flow. (#1951) (@maleadt)
Allow StaticArray eltype in matmat{vec,mul} (#1954) (@lcw)
Bump CUDNN to v8.9. (#1959) (@maleadt)
Bump CUTENSOR to v1.7. (#1960) (@maleadt)
Add and fix some aqua tests (#1965) (@charleskawczynski)
Fix compatibility of CUDA 11.4 to support Orin. (#1967) (@maleadt)
Don't use Int32 indices in rand kernels. (#1969) (@maleadt)
CI simplifications (#1970) (@maleadt)
Use Base.pkgversion on 1.9. (#1974) (@maleadt)
Update to LLVM.jl 6. (#1976) (@maleadt)
fix launch config bug in bitonic sort (#1979) (@xaellison)
Update manifest (#1980) (@github-actions[bot])

Contributors

lcw, maleadt, and 3 other contributors

Assets 2

02 Jun 05:55

github-actions

v4.3.2

acd245e

v4.3.2

CUDA v4.3.2

Diff since v4.3.1

Merged pull requests:

Reduce load time by shifting mul! definition (#1904) (@dkarrasch)

Contributors

dkarrasch

Assets 2

31 May 19:40

github-actions

v4.3.1

b7420f8

v4.3.1

CUDA v4.3.1

Diff since v4.3.0

Closed issues:

Array testsuite compiles kernel with large types (#1902)
CUDA.jl v4 installs CUDA runtime despite version=local (#1922)
Occaisonal "CUSOLVERError: an internal operation failed (code 7, CUSOLVER_STATUS_INTERNAL_ERROR)" (#1924)
Does cuDNN@v1.0.4 need CUDA@v4.3? (#1929)

Merged pull requests:

Simplify libdevice linking. (#1927) (@maleadt)
Add a show method for kernel objects. (#1928) (@maleadt)
Update manifest (#1930) (@github-actions[bot])
Pass a higher capability to ptxas. (#1931) (@maleadt)

Contributors

maleadt

Assets 2

23 May 18:34

github-actions

v4.3.0

d3b1363

v4.3.0

CUDA v4.3.0

Diff since v4.2.0

Closed issues:

Multidimensional reverse (#1126)
Test errors on master (#1866)
Integer overflow error with svd for large matrix (#1880)
Erratic behaviour of CUDA.jl if used in the REPL of VSCode. (#1892)
QR decomposition requires scalar indexing (#1893)
BSOD during package tests (#1898)
Insufficient coverage of CuArrays in the documentation (#1901)
Failed to compile with Julia v1.9 on PowerPC (#1911)
CUDA test failed in wmma.jl (#1914)
Fix deprecation warnings (#1920)

Merged pull requests:

CUSOLVER: Fix workspace size passing. (#1890) (@maleadt)
Lovelace fixes (#1894) (@maleadt)
Update manifest (#1897) (@github-actions[bot])
Reverse with multiple dimensions (#1899) (@RainerHeintzmann)
Restrict number of test jobs based on available memory. (#1900) (@maleadt)
Avoid unneeded macros to cut down on generated code (#1905) (@maleadt)
Avoid unneeded macros to cut down on generated code (#1906) (@maleadt)
Update manifest (#1907) (@github-actions[bot])
Bump GPUCompiler. (#1908) (@maleadt)
Don't use Float64 atomics on unsupported platforms. (#1912) (@maleadt)
Report package versions as part of versioninfo(). (#1913) (@maleadt)
Align variables in constant memory by 256 bit (#1915) (@Zentrik)
Add norm functions for 3 floats (#1916) (@Zentrik)
cuDNN: only choose conv algorithms if they match descriptor mathType (#1917) (@ToucheSir)
Update manifest (#1918) (@github-actions[bot])
Skip Integer WMMA tests on older devices. (#1919) (@maleadt)

Contributors

maleadt, ToucheSir, and 2 other contributors

Assets 2

02 May 13:25

github-actions

v4.2.0

af65a44

v4.2.0

CUDA v4.2.0

Diff since v4.1.4

Closed issues:

NVTX: consider using Start/End for ranges (#1485)
Limitations of CuIterator (#1768)
Testing fails on unsupported devices. (#1815)
Local runtime discovery does not work for external libraries (CUDNN, CUTENSOR) (#1850)
Passing tests using Github CI workflow errors with libcuda not defined (#1867)
Cannot precompile GPU code with SnoopPrecompile (#1870)
Incorrect kernel execution with bounds checking using Julia 1.9.0-rc2 (#1875)
Fake CUDA library (#1879)
Error thrown when launching Julia with Nsight systems or compute. (#1886)
Cannot construct CuDeviceArray (#1887)
Incorrect colVal array when using CuSparseMatrixCSR command on sparse matrix (#1888)

Merged pull requests:

Use adapt symmetrically in CuIterator (#1769) (@mcabbott)
Allow but warn when testing on not fully-supported devices. (#1818) (@maleadt)
Support runtime discovery for non-toolkit libraries (CUTENSOR, CUDNN, CUQUANTUM) (#1858) (@mloubout)
Add KernelAbstractions.jl unsafe_free! (#1863) (@pxl-th)
Allow precompiling CUDA code. (#1865) (@maleadt)
Assert CUDA.jl is functional when creating the TLS. (#1868) (@maleadt)
Update manifest (#1871) (@github-actions[bot])
Don't collect AbstractQ objects in tests (#1872) (@dkarrasch)
Add compatibility entry for Lovelace (#1873) (@xaellison)
remove some type-piracy from cusparse (#1876) (@vtjnash)
Remove more unneeded ndims methods. (#1878) (@maleadt)
Guard the initialization-time CUDA driver check in a try/catch. (#1881) (@maleadt)
Update manifest (#1882) (@github-actions[bot])
Update CUDA 12.1 to 12.1.1. (#1883) (@maleadt)
Use atomics for allocation statistics. (#1884) (@maleadt)
Fix atomic increment of alloc stats. (#1885) (@maleadt)
Update manifest (#1889) (@github-actions[bot])

Contributors

vtjnash, maleadt, and 5 other contributors

Assets 2

13 Apr 15:31

github-actions

v4.1.4

7e86df8

v4.1.4

CUDA v4.1.4

Diff since v4.1.3

Closed issues:

Buggy precompilation of init-defined symbols can break CUDA_Driver_jll initialization (#1798)
Calling CUDA.set_runtime_version!() with float parameter makes CUDA.jl unusable. (#1831)
Unexpexted memory allocation when using randn! (#1856)
The memory copy speed seems to exceed the hardware limit (#1860)
PCG produces different output on GPU (via Krylov.jl) (#1864)

Merged pull requests:

Fix system_driver_version on platforms not supported by CUDA_Driver_jll. (#1854) (@maleadt)
Update manifest (#1861) (@github-actions[bot])

Contributors

maleadt

Assets 2

31 Mar 16:08

github-actions

v4.1.3

4e8f45b

v4.1.3

CUDA v4.1.3

Diff since v4.1.2

Closed issues:

CUDA.versioninfo() triggers download of lazy artifacts (#1844)

Merged pull requests:

Choose parallel tests based on CPUs, not threads. (#1842) (@maleadt)
Adapt to LLVM.jl 5 and GPUCompiler.jl 0.19. (#1847) (@maleadt)

Contributors

maleadt

Assets 2

29 Mar 08:23

github-actions

v4.1.2

1aa3e6b

v4.1.2

CUDA v4.1.2

Diff since v4.1.1

Closed issues:

Flux's gradient differentiatingrfft leads to non-bit error (#1835)

Merged pull requests:

switch to using defined globals (#1832) (@simonbyrne)
Update manifest (#1837) (@github-actions[bot])

Contributors

simonbyrne

Assets 2

26 Mar 04:08

github-actions

v4.1.1

d235d35

v4.1.1

CUDA v4.1.1

Diff since v4.1.0

Merged pull requests:

Fix export of CUDABackend (#1834) (@vchuravy)

Contributors

vchuravy

Assets 2

24 Mar 02:30

github-actions

v4.1.0

cf4598f

v4.1.0

CUDA v4.1.0

Diff since v4.0.1

Closed issues:

ERROR: LoadError: bin\cublas64_11.dll when installing CUDA (#1750)
System-wide CUDA in LD_LIBRARY_PATH breaks CUBLAS (#1755)
CuDeviceTexture getindex breaks when executed on the CPU (#1757)
cuDNN.version can cause Julia to crash, missing cudnn_ops_infer64_8.dll (#1777)
cuDNN compile error "ERROR: LoadError: ArgumentError: invalid version string: local" (#1783)
"Error: No CUDA Runtime library found" for ≥v4.0.0 (#1808)
sqrt broken in kernels 'Format of __nvvm__reflect function not recognized' (#1817)

Merged pull requests:

Add support for CUDA 12.0. (#1742) (@maleadt)
Add more fixes and tests for CUDA toolkit 12.0 (#1756) (@amontoison)
Update manifest (#1758) (@github-actions[bot])
Fix test/cusparse/interfaces.jl (#1762) (@amontoison)
Simplify the function sig. (#1763) (@N5N3)
Update manifest (#1770) (@github-actions[bot])
Make versioninfo() resilient against NVML EPERM. (#1771) (@maleadt)
Move CUDAKernels to CUDA.jl (#1772) (@vchuravy)
[CUSPARSE] Improve conversion and tests between sparse matrices (#1774) (@amontoison)
Use geam for + and - operations with CuMatrix{<:CublasFloat} (#1775) (@amontoison)
Update manifest (#1776) (@github-actions[bot])
Update manifest (#1781) (@github-actions[bot])
Update manifest (#1784) (@github-actions[bot])
[CUSPARSE] Update preconditioners.jl (#1785) (@amontoison)
[CUSOLVER] Avoid the conversion to CSR format for reordering routines (#1786) (@amontoison)
Bump GPUCompiler. (#1787) (@maleadt)
Remove unneeded variable. (#1788) (@maleadt)
[CUSPARSE] Update conversions.jl (#1791) (@amontoison)
Update to CUDNN 8.8.1 for CUDA 12 compatibility. (#1792) (@maleadt)
Add support for CUDA 12.1 (#1793) (@maleadt)
[CUSPARSE] Interface color reordering (#1794) (@amontoison)
[CUSPARSE] Interface gtsv2 (#1795) (@amontoison)
Update manifest (#1796) (@github-actions[bot])
Adapt to GPUCompiler 0.18 (#1799) (@maleadt)
Follow Array's behavior when initializing (#1800) (@lcw)
[CUSOLVER] Support A \ b for rectangular matrices (#1802) (@amontoison)
Use symbols instead of values when emitting code, when possible. (#1804) (@maleadt)
Refactor CI pipeline a little. (#1805) (@maleadt)
[CUSOLVER] Improve the dispatch for LAPACK routines (#1806) (@amontoison)
Diagonal for lower triangular of LU decomposition set incorrectly (#1813) (@tgymnich)
CompatHelper: add new compat entry for "KernelAbstractions" at version "0.9" (#1824) (@github-actions[bot])
Rebuild CUPTI API with support for STRUCT_SIZE (#1827) (@vchuravy)
Release CUDA 4.1 (#1828) (@vchuravy)

Contributors

lcw, vchuravy, and 4 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA v4.4.0

Contributors

CUDA v4.3.2

Contributors

CUDA v4.3.1

Contributors

CUDA v4.3.0

Contributors

CUDA v4.2.0

Contributors

CUDA v4.1.4

Contributors

CUDA v4.1.3

Contributors

CUDA v4.1.2

Contributors

CUDA v4.1.1

Contributors

CUDA v4.1.0

Contributors

Releases: JuliaGPU/CUDA.jl

v4.4.0

CUDA v4.4.0

Contributors

v4.3.2

CUDA v4.3.2

Contributors

v4.3.1

CUDA v4.3.1

Contributors

v4.3.0

CUDA v4.3.0

Contributors

v4.2.0

CUDA v4.2.0

Contributors

v4.1.4

CUDA v4.1.4

Contributors

v4.1.3

CUDA v4.1.3

Contributors

v4.1.2

CUDA v4.1.2

Contributors

v4.1.1

CUDA v4.1.1

Contributors

v4.1.0

CUDA v4.1.0

Contributors