v2.4.0
CUDA v2.4.0
Closed issues:
- cublasXtStrmm test failures on Windows 10 Julia 1.1 (#124)
- CUSPARSE tests broken (#259)
- Make @cuda return a kernel object (#341)
- Depend on CompilerSupportLibraries (#359)
- CUBLAS and exceptions test failures on Windows (#536)
- argmax(::CuArray) returns nothing with NaN-values (#553)
- Multiple @cuDynamicSharedMem in kernel causes unexpected behavior (#555)
- Illegal memory access with atomic shared memory (#558)
- CUDA.sqrt will not found symbol "__nv_sqrt" (#559)
- Exception with CUDA.exp (#561)
- Use LazyArtifacts instead of Pkg (#570)
- Test runner: early bail out (#578)
- memory reporting issue (#579)
- c[3:4]=0 leads to exception (#580)
- Add math ops (including broadcast) for half types (#581)
- Dot product of Array and CuArray fails with CPU address error. (#586)
- Support for CUDA-capable GPU with compute capability 4.0 like GTX 1080 (#587)
- mapreducedim! not threadsafe (#588)
- Allow separate directories for cuda and cudnn (#590)
- Difficulties installing CUDA on Julia 1.6.0 . (#591)
- Bug in Initialisation Error (#603)
- CUDA.jl initialisation fails after suspending Ubuntu 20.04 with CUDA 11.2 (#605)
- CUDA 11.2 CUBLASError and "CUDA.jl does not yet support CUDA with nvdisasm 11.2.67" (#607)
- This intrinsic must be compiled to be called (#611)
- OpenGL interop (#612)
- Add support for CuFFT callback functions (#614)
- I can’t multiply a CSR sparse matrix anymore (#615)
- Julia version requirement (#619)
Merged pull requests:
- Support all combinations of datatypes and transposes/adjoints in LinearAlgebra (#535) (@cqql)
- Use structs for texture intrinsic return types. (#554) (@maleadt)
- Backport some 1.6 fixes (#557) (@maleadt)
- Update manifest (#560) (@github-actions[bot])
- Correct dims error (#562) (@DhairyaLGandhi)
- Lock
_shmem_cb
(#564) (@vchuravy) - Move to Julia 1.6 (#566) (@maleadt)
- Adapt to JuliaLang/julia#38487. (#568) (@maleadt)
- Support for 'delayed kernels' (#569) (@maleadt)
- Run cuda-memcheck as part of CI (#571) (@maleadt)
- Use at-sync instead of calls to synchronize in tests. (#572) (@maleadt)
- Update artifacts to include cuda-memcheck (#573) (@maleadt)
- Use LazyArtifacts instead of Pkg. (#574) (@maleadt)
- Improve LinearAlgebra impl methods for triangular types (#575) (@maleadt)
- New findmin/max implementation using single-pass reduction (#576) (@maleadt)
- Fix synchronization before testing cublasXt calls. (#577) (@maleadt)
- Fix used memory reporting. (#582) (@maleadt)
- Implement Statistics.varm/stdm instead of Statistics._var (#583) (@sdewaele)
- Test for #558. (#584) (@maleadt)
- Add a quick failure option to the test runner. (#585) (@maleadt)
- Add lock around
cfunction
lookup (#589) (@vchuravy) - Catch all initialization errors. (#593) (@maleadt)
- Update dependencies. (#596) (@maleadt)
- Fix wrong initialisation error message (#604) (@qin-yu)
- Fixes wrong spacing in docstring admonition (#608) (@navidcy)
- Fix broadcasting with Base.angle (#618) (@marius311)
- Test with the 1.6 nightly, not 1.7. (#620) (@maleadt)
- Wrap cudaGL.h (#621) (@maleadt)
- Initial compatibility with CUDA 11.2. (#622) (@maleadt)
- 1.5 compatibility release (#623) (@maleadt)
- Add CUDA 11.2 artifacts. (#624) (@maleadt)