Releases: JuliaGPU/CUDA.jl
Releases · JuliaGPU/CUDA.jl
v4.4.0
CUDA v4.4.0
Closed issues:
- Unreachable control flow leads to illegal divergent barriers (#1746)
- CUBLAS fails on new CUDA.jl v4 (#1852)
- Sort fails on Lovelace (sm8.9) GPUs (#1874)
- gesvd! crashes on Pascal and v12.0 (#1932)
- No effect for calling "nsys launch" (#1938)
- Basic math operations with nested adjoint and transpose (#1940)
- CPU and GPU implementations return results at dissimilar scales, even in double precision arithmetics (#1950)
- Failed CUDA.jl initialization breaks Flux? (#1952)
- Recent
mul!
changes break multiplication with matrices that haveStaticArray
elements (#1953) - Test infrastructure: define test groups (#1961)
- Strange
rand
errors when sampling large matrices (#1963) - Add aqua tests (#1964)
- Support of Orin GPU from Nvidia ? (#1966)
- Crash in LLVM (#1971)
- Warning cuDNN Convolution (#1972)
- Strange behaviour when installed at system level (#1973)
Merged pull requests:
- Update benchmarks for 1.8 and 1.9 (#1933) (@maleadt)
- CUSOLVER: Explicitly pass NULL when not requesting svd outputs. (#1934) (@maleadt)
- Detect and complain about loading system libraries. (#1935) (@maleadt)
- Update manifest (#1936) (@github-actions[bot])
- Avoid stack overflow with eary OOM reporting. (#1937) (@maleadt)
- [CUSPARSE] Improved support for UniformScaling ad Diagonal (#1941) (@albertomercurio)
- Update manifest (#1949) (@github-actions[bot])
- Update GPUCompiler to fix unreachable control flow. (#1951) (@maleadt)
- Allow StaticArray eltype in matmat{vec,mul} (#1954) (@lcw)
- Bump CUDNN to v8.9. (#1959) (@maleadt)
- Bump CUTENSOR to v1.7. (#1960) (@maleadt)
- Add and fix some aqua tests (#1965) (@charleskawczynski)
- Fix compatibility of CUDA 11.4 to support Orin. (#1967) (@maleadt)
- Don't use Int32 indices in rand kernels. (#1969) (@maleadt)
- CI simplifications (#1970) (@maleadt)
- Use Base.pkgversion on 1.9. (#1974) (@maleadt)
- Update to LLVM.jl 6. (#1976) (@maleadt)
- fix launch config bug in bitonic sort (#1979) (@xaellison)
- Update manifest (#1980) (@github-actions[bot])
v4.3.2
v4.3.1
CUDA v4.3.1
Closed issues:
- Array testsuite compiles kernel with large types (#1902)
- CUDA.jl v4 installs CUDA runtime despite version=local (#1922)
- Occaisonal "CUSOLVERError: an internal operation failed (code 7, CUSOLVER_STATUS_INTERNAL_ERROR)" (#1924)
- Does cuDNN@v1.0.4 need CUDA@v4.3? (#1929)
Merged pull requests:
v4.3.0
CUDA v4.3.0
Closed issues:
- Multidimensional
reverse
(#1126) - Test errors on master (#1866)
- Integer overflow error with svd for large matrix (#1880)
- Erratic behaviour of
CUDA.jl
if used in the REPL of VSCode. (#1892) - QR decomposition requires scalar indexing (#1893)
- BSOD during package tests (#1898)
- Insufficient coverage of CuArrays in the documentation (#1901)
- Failed to compile with Julia v1.9 on PowerPC (#1911)
- CUDA test failed in wmma.jl (#1914)
- Fix deprecation warnings (#1920)
Merged pull requests:
- CUSOLVER: Fix workspace size passing. (#1890) (@maleadt)
- Lovelace fixes (#1894) (@maleadt)
- Update manifest (#1897) (@github-actions[bot])
- Reverse with multiple dimensions (#1899) (@RainerHeintzmann)
- Restrict number of test jobs based on available memory. (#1900) (@maleadt)
- Avoid unneeded macros to cut down on generated code (#1905) (@maleadt)
- Avoid unneeded macros to cut down on generated code (#1906) (@maleadt)
- Update manifest (#1907) (@github-actions[bot])
- Bump GPUCompiler. (#1908) (@maleadt)
- Don't use Float64 atomics on unsupported platforms. (#1912) (@maleadt)
- Report package versions as part of versioninfo(). (#1913) (@maleadt)
- Align variables in constant memory by 256 bit (#1915) (@Zentrik)
- Add norm functions for 3 floats (#1916) (@Zentrik)
- cuDNN: only choose conv algorithms if they match descriptor mathType (#1917) (@ToucheSir)
- Update manifest (#1918) (@github-actions[bot])
- Skip Integer WMMA tests on older devices. (#1919) (@maleadt)
v4.2.0
CUDA v4.2.0
Closed issues:
- NVTX: consider using Start/End for ranges (#1485)
- Limitations of
CuIterator
(#1768) - Testing fails on unsupported devices. (#1815)
- Local runtime discovery does not work for external libraries (CUDNN, CUTENSOR) (#1850)
- Passing tests using Github CI workflow errors with
libcuda not defined
(#1867) - Cannot precompile GPU code with SnoopPrecompile (#1870)
- Incorrect kernel execution with bounds checking using Julia 1.9.0-rc2 (#1875)
- Fake CUDA library (#1879)
- Error thrown when launching Julia with Nsight systems or compute. (#1886)
- Cannot construct CuDeviceArray (#1887)
- Incorrect colVal array when using CuSparseMatrixCSR command on sparse matrix (#1888)
Merged pull requests:
- Use
adapt
symmetrically inCuIterator
(#1769) (@mcabbott) - Allow but warn when testing on not fully-supported devices. (#1818) (@maleadt)
- Support runtime discovery for non-toolkit libraries (CUTENSOR, CUDNN, CUQUANTUM) (#1858) (@mloubout)
- Add KernelAbstractions.jl unsafe_free! (#1863) (@pxl-th)
- Allow precompiling CUDA code. (#1865) (@maleadt)
- Assert CUDA.jl is functional when creating the TLS. (#1868) (@maleadt)
- Update manifest (#1871) (@github-actions[bot])
- Don't collect
AbstractQ
objects in tests (#1872) (@dkarrasch) - Add compatibility entry for Lovelace (#1873) (@xaellison)
- remove some type-piracy from cusparse (#1876) (@vtjnash)
- Remove more unneeded ndims methods. (#1878) (@maleadt)
- Guard the initialization-time CUDA driver check in a try/catch. (#1881) (@maleadt)
- Update manifest (#1882) (@github-actions[bot])
- Update CUDA 12.1 to 12.1.1. (#1883) (@maleadt)
- Use atomics for allocation statistics. (#1884) (@maleadt)
- Fix atomic increment of alloc stats. (#1885) (@maleadt)
- Update manifest (#1889) (@github-actions[bot])
v4.1.4
CUDA v4.1.4
Closed issues:
- Buggy precompilation of init-defined symbols can break CUDA_Driver_jll initialization (#1798)
- Calling CUDA.set_runtime_version!() with float parameter makes CUDA.jl unusable. (#1831)
- Unexpexted memory allocation when using
randn!
(#1856) - The memory copy speed seems to exceed the hardware limit (#1860)
- PCG produces different output on GPU (via Krylov.jl) (#1864)
Merged pull requests:
v4.1.3
v4.1.2
CUDA v4.1.2
Closed issues:
- Flux's
gradient
differentiatingrfft
leads to non-bit error (#1835)
Merged pull requests:
- switch to using defined globals (#1832) (@simonbyrne)
- Update manifest (#1837) (@github-actions[bot])
v4.1.1
v4.1.0
CUDA v4.1.0
Closed issues:
- ERROR: LoadError: bin\cublas64_11.dll when installing CUDA (#1750)
- System-wide CUDA in LD_LIBRARY_PATH breaks CUBLAS (#1755)
- CuDeviceTexture getindex breaks when executed on the CPU (#1757)
- cuDNN.version can cause Julia to crash, missing
cudnn_ops_infer64_8.dll
(#1777) - cuDNN compile error "ERROR: LoadError: ArgumentError: invalid version string: local" (#1783)
- "Error: No CUDA Runtime library found" for ≥v4.0.0 (#1808)
- sqrt broken in kernels 'Format of __nvvm__reflect function not recognized' (#1817)
Merged pull requests:
- Add support for CUDA 12.0. (#1742) (@maleadt)
- Add more fixes and tests for CUDA toolkit 12.0 (#1756) (@amontoison)
- Update manifest (#1758) (@github-actions[bot])
- Fix test/cusparse/interfaces.jl (#1762) (@amontoison)
- Simplify the function sig. (#1763) (@N5N3)
- Update manifest (#1770) (@github-actions[bot])
- Make versioninfo() resilient against NVML EPERM. (#1771) (@maleadt)
- Move CUDAKernels to CUDA.jl (#1772) (@vchuravy)
- [CUSPARSE] Improve conversion and tests between sparse matrices (#1774) (@amontoison)
- Use geam for + and - operations with CuMatrix{<:CublasFloat} (#1775) (@amontoison)
- Update manifest (#1776) (@github-actions[bot])
- Update manifest (#1781) (@github-actions[bot])
- Update manifest (#1784) (@github-actions[bot])
- [CUSPARSE] Update preconditioners.jl (#1785) (@amontoison)
- [CUSOLVER] Avoid the conversion to CSR format for reordering routines (#1786) (@amontoison)
- Bump GPUCompiler. (#1787) (@maleadt)
- Remove unneeded variable. (#1788) (@maleadt)
- [CUSPARSE] Update conversions.jl (#1791) (@amontoison)
- Update to CUDNN 8.8.1 for CUDA 12 compatibility. (#1792) (@maleadt)
- Add support for CUDA 12.1 (#1793) (@maleadt)
- [CUSPARSE] Interface color reordering (#1794) (@amontoison)
- [CUSPARSE] Interface gtsv2 (#1795) (@amontoison)
- Update manifest (#1796) (@github-actions[bot])
- Adapt to GPUCompiler 0.18 (#1799) (@maleadt)
- Follow
Array
's behavior when initializing (#1800) (@lcw) - [CUSOLVER] Support A \ b for rectangular matrices (#1802) (@amontoison)
- Use symbols instead of values when emitting code, when possible. (#1804) (@maleadt)
- Refactor CI pipeline a little. (#1805) (@maleadt)
- [CUSOLVER] Improve the dispatch for LAPACK routines (#1806) (@amontoison)
- Diagonal for lower triangular of LU decomposition set incorrectly (#1813) (@tgymnich)
- CompatHelper: add new compat entry for "KernelAbstractions" at version "0.9" (#1824) (@github-actions[bot])
- Rebuild CUPTI API with support for STRUCT_SIZE (#1827) (@vchuravy)
- Release CUDA 4.1 (#1828) (@vchuravy)