Release v5.1.0 · JuliaGPU/CUDA.jl

CUDA v5.1.0

CUDA.jl 5.1 greatly improves the support of two important parts of the CUDA toolkit: unified memory, for accessing GPU memory on the CPU and vice-versa, and cooperative groups which offer a more modular approach to kernel programming. For more details, see the blog post.

Diff since v5.0.0

Merged pull requests:

[CUSOLVER] Add generic routines (#2074) (@amontoison)
Rework and extend the cooperative groups API. (#2081) (@maleadt)
[CUSOLVER] Add a method for geqrf! (#2085) (@amontoison)
Fix some typos in perfomance tips (#2086) (@Zentrik)
Improve PTX ISA selection (#2088) (@maleadt)
Update manifest (#2090) (@github-actions[bot])
support ChainRulesCore inplaceability (#2091) (@piever)
Add a method inv(CuMatrix) (#2095) (@amontoison)
Add mul!(A, B, C) where B or C is a diagonal matrix (#2096) (@amontoison)
Add CUDA_Runtime_Discovery dependency to sublibraries. (#2097) (@maleadt)
Handle and test zero-size inputs to RNGs. (#2098) (@maleadt)
Add a with_workspaces function (#2099) (@amontoison)
[CUSOLVER] Add a method for getrf! (#2100) (@amontoison)
[CUSOLVER] Fix a typo with jobu / jobvt in gesvd (#2101) (@amontoison)
Call exit when handling exceptions. (#2103) (@maleadt)
Bump packages. (#2104) (@maleadt)
Bump actions/checkout from 3 to 4 (#2106) (@dependabot[bot])
Update manifest (#2107) (@github-actions[bot])
Make Ref mutable on the GPU. (#2109) (@maleadt)
CompatHelper: bump compat for CEnum to 0.5, (keep existing compat) (#2110) (@github-actions[bot])
Small profiler improvements (#2113) (@maleadt)
Update manifest (#2114) (@github-actions[bot])
[CUSPARSE] Wrap new functions added with CUDA 12.2 (#2116) (@amontoison)
[CUSOLVER] Add new methods for \ and inv (#2117) (@amontoison)
Fix incorrect timing results for CUDA.@elapsed (#2118) (@thomasfaingnaert)
[CUSOLVER] Interface sparse Cholesky and QR factorizations (#2121) (@amontoison)
Update manifest (#2123) (@github-actions[bot])
Profiler: Show used local memory. (#2124) (@maleadt)
Support for CUDA 12.3 (#2125) (@maleadt)
[CUSOLVER] Add Add Xsyevdx! and Xgesvdr! (#2127) (@amontoison)
[CUSOLVER] Add Xgesvdp (#2128) (@amontoison)
Profiler: don't crop when rendering to a file. (#2131) (@maleadt)
Regenerate headers for CUDA 12.3. (#2132) (@maleadt)
[CUSPARSE] Fix a bug with triangular solves (#2134) (@amontoison)
CompatHelper: add new compat entry for Statistics at version 1, (keep existing compat) (#2135) (@github-actions[bot])
CompatHelper: add new compat entry for LazyArtifacts at version 1, (keep existing compat) (#2136) (@github-actions[bot])
Profiler: Parse and visualize NVTX marker data. (#2137) (@maleadt)
Better support for unified and host memory (#2138) (@maleadt)
Profiler: Improve compatibility with Pluto.jl and friends. (#2139) (@maleadt)
Avoid allocations during derived array construction. (#2142) (@maleadt)
More performance tweaks for memory copying (#2143) (@maleadt)
Don't use libdevice's fmin/fmax. (#2144) (@maleadt)
Update documentation (#2146) (@maleadt)
Fixes for sm_61 (#2151) (@maleadt)
Update sparse factorizations (#2152) (@amontoison)
Don't call into LLVM's fmin/fmax on <sm_80. (#2154) (@maleadt)
Only prefect unified memory if concurrent access is possible. (#2155) (@maleadt)
Support wrapping an Array with a CuArray without HMM. (#2156) (@maleadt)

Closed issues:

Element-wise conversion to Duals (#127)
IDEA: CuHostArray (#28)
Make Ref pass by-reference (#267)
view(data, idx) boundschecking is disproportionately expensive (#1678)
[CUSOLVER] Add a with_workspaces function to allocate two buffers (Device / Host) (#1767)
dlopen("libcudart") results in duplicate libraries (#1814)
Support for JLD2 (#1833)
Windows Defender mis-labels artifacts as threat (#1836)
Support Cholesky factorization of CuSparseMatrixCSR (#1855)
Runtime not re-selected after driver upgrade (#1877)
Failure to initialize with CUDA_VISIBLE_DEVICES='' (#1945)
Cannot precompile GPU code with PrecompileTools (#2006)
CUDA_SDK_jll: cuda.h in different locations depending on the platform (#2066)
PTX ISA 8.1 support (#2080)
Segmentation fault when importing CUDA (#2083)
"No system CUDA driver found" on NixOS (#2089)
CUDA.rand(Int64, m, n) can not be used when m or n is zero (#2093)
Missing CUDA_Runtime_Discovery as a dependency in cuDNN (#2094)
Binaries for Jetson (#2105)
Minimum/maximum of array of NaNs is infinity (#2111)
Performance regression for multiple @sync copyto! on CUDA v5 (#2112)
[CUBLAS] Regenerate the wrappers with updated argument types (#2115)
Unable to allocate unified memory buffers (#2120)
CUDA 12.3 has been released (#2122)
atomic min, max for Float32 and Float64 (#2129)
Native profiler output is limited to around 100 columns when printing to a file (#2130)
LLVM generates max.NaN which only works on sm_80 (#2148)
Unified memory-related error on Tegra T194 (#2149)
Errors on sm_61 (#2150)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v5.1.0

CUDA v5.1.0

Contributors