v5.1.0
CUDA v5.1.0
CUDA.jl 5.1 greatly improves the support of two important parts of the CUDA toolkit: unified memory, for accessing GPU memory on the CPU and vice-versa, and cooperative groups which offer a more modular approach to kernel programming. For more details, see the blog post.
Merged pull requests:
- [CUSOLVER] Add generic routines (#2074) (@amontoison)
- Rework and extend the cooperative groups API. (#2081) (@maleadt)
- [CUSOLVER] Add a method for geqrf! (#2085) (@amontoison)
- Fix some typos in perfomance tips (#2086) (@Zentrik)
- Improve PTX ISA selection (#2088) (@maleadt)
- Update manifest (#2090) (@github-actions[bot])
- support ChainRulesCore inplaceability (#2091) (@piever)
- Add a method inv(CuMatrix) (#2095) (@amontoison)
- Add mul!(A, B, C) where B or C is a diagonal matrix (#2096) (@amontoison)
- Add CUDA_Runtime_Discovery dependency to sublibraries. (#2097) (@maleadt)
- Handle and test zero-size inputs to RNGs. (#2098) (@maleadt)
- Add a with_workspaces function (#2099) (@amontoison)
- [CUSOLVER] Add a method for getrf! (#2100) (@amontoison)
- [CUSOLVER] Fix a typo with jobu / jobvt in gesvd (#2101) (@amontoison)
- Call exit when handling exceptions. (#2103) (@maleadt)
- Bump packages. (#2104) (@maleadt)
- Bump actions/checkout from 3 to 4 (#2106) (@dependabot[bot])
- Update manifest (#2107) (@github-actions[bot])
- Make Ref mutable on the GPU. (#2109) (@maleadt)
- CompatHelper: bump compat for CEnum to 0.5, (keep existing compat) (#2110) (@github-actions[bot])
- Small profiler improvements (#2113) (@maleadt)
- Update manifest (#2114) (@github-actions[bot])
- [CUSPARSE] Wrap new functions added with CUDA 12.2 (#2116) (@amontoison)
- [CUSOLVER] Add new methods for \ and inv (#2117) (@amontoison)
- Fix incorrect timing results for
CUDA.@elapsed
(#2118) (@thomasfaingnaert) - [CUSOLVER] Interface sparse Cholesky and QR factorizations (#2121) (@amontoison)
- Update manifest (#2123) (@github-actions[bot])
- Profiler: Show used local memory. (#2124) (@maleadt)
- Support for CUDA 12.3 (#2125) (@maleadt)
- [CUSOLVER] Add Add Xsyevdx! and Xgesvdr! (#2127) (@amontoison)
- [CUSOLVER] Add Xgesvdp (#2128) (@amontoison)
- Profiler: don't crop when rendering to a file. (#2131) (@maleadt)
- Regenerate headers for CUDA 12.3. (#2132) (@maleadt)
- [CUSPARSE] Fix a bug with triangular solves (#2134) (@amontoison)
- CompatHelper: add new compat entry for Statistics at version 1, (keep existing compat) (#2135) (@github-actions[bot])
- CompatHelper: add new compat entry for LazyArtifacts at version 1, (keep existing compat) (#2136) (@github-actions[bot])
- Profiler: Parse and visualize NVTX marker data. (#2137) (@maleadt)
- Better support for unified and host memory (#2138) (@maleadt)
- Profiler: Improve compatibility with Pluto.jl and friends. (#2139) (@maleadt)
- Avoid allocations during derived array construction. (#2142) (@maleadt)
- More performance tweaks for memory copying (#2143) (@maleadt)
- Don't use libdevice's fmin/fmax. (#2144) (@maleadt)
- Update documentation (#2146) (@maleadt)
- Fixes for sm_61 (#2151) (@maleadt)
- Update sparse factorizations (#2152) (@amontoison)
- Don't call into LLVM's fmin/fmax on <sm_80. (#2154) (@maleadt)
- Only prefect unified memory if concurrent access is possible. (#2155) (@maleadt)
- Support wrapping an Array with a CuArray without HMM. (#2156) (@maleadt)
Closed issues:
- Element-wise conversion to Duals (#127)
- IDEA: CuHostArray (#28)
- Make Ref pass by-reference (#267)
- view(data, idx) boundschecking is disproportionately expensive (#1678)
- [CUSOLVER] Add a with_workspaces function to allocate two buffers (Device / Host) (#1767)
- dlopen("libcudart") results in duplicate libraries (#1814)
- Support for JLD2 (#1833)
- Windows Defender mis-labels artifacts as threat (#1836)
- Support Cholesky factorization of CuSparseMatrixCSR (#1855)
- Runtime not re-selected after driver upgrade (#1877)
- Failure to initialize with CUDA_VISIBLE_DEVICES='' (#1945)
- Cannot precompile GPU code with PrecompileTools (#2006)
- CUDA_SDK_jll: cuda.h in different locations depending on the platform (#2066)
- PTX ISA 8.1 support (#2080)
- Segmentation fault when importing CUDA (#2083)
- "No system CUDA driver found" on NixOS (#2089)
CUDA.rand(Int64, m, n)
can not be used whenm
orn
is zero (#2093)- Missing CUDA_Runtime_Discovery as a dependency in cuDNN (#2094)
- Binaries for Jetson (#2105)
- Minimum/maximum of array of NaNs is infinity (#2111)
- Performance regression for multiple
@sync
copyto! on CUDA v5 (#2112) - [CUBLAS] Regenerate the wrappers with updated argument types (#2115)
- Unable to allocate unified memory buffers (#2120)
- CUDA 12.3 has been released (#2122)
- atomic min, max for Float32 and Float64 (#2129)
- Native profiler output is limited to around 100 columns when printing to a file (#2130)
- LLVM generates max.NaN which only works on sm_80 (#2148)
- Unified memory-related error on Tegra T194 (#2149)
- Errors on sm_61 (#2150)