v2.0.0
CUDA v2.0.0
Closed issues:
- Test failure during threading tests (#15)
- Bad allocations in memory pool after device_reset! (#16)
- CuArrays can lose Blas on reshaped views (#78)
- allowscalar performance (#87)
- Indexing with a CuArrays causes a 'scalar indexing disallowed' error from checkbounds (#90)
- 5-arg mul! for CUSPARSE (#98)
- copyto!(Device, Host) uses scalar iteration in case of type mismatch (#105)
- Array primitives broken for CUSPARSE arrays (#113)
- SplittingPool: CPU allocations (#117)
- error while concatenating to an empty CuArray (#139)
- Showing sparse arrays goes wrong (#146)
- Improve test coverage (#147)
- CuArrays allocates a lot of memory on the default GPU (#153)
- [Feature Request] Indexing CuArray with CuArray (#155)
- Reshaping CuArray throws error during backpropagation (#162)
- Match syntax and APIs against Julia 1.0 standard libraries (#163)
- CURAND_STATUS_PREEXISTING_FAILURE when setting seed multiple times. (#212)
- RFC: converts
SparseMatrixCSC
toCuSparseMatrixCSR
viacu
by default (#216) - Add a CuSparseMatrixCOO type (#220)
- Test runner stumbles over path separators (#236)
- Error: Invalid bitcode signature when loading CUDA.jl after precompilation (#293)
- Atomic operations only work on global memory (#311)
- Performance: cudnn algorithm selection (#318)
- CUSPARSE is broken in CUDA.jl 1.2 (#322)
- Device-side broadcast regression on 1.5 (#350)
- API for fast math-like mode (#354)
- CUDA 11.0 Update 1: cublasSetWorkspace (#365)
- Can't precompile CUDA.jl on Kubuntu 20.04 (#396)
- CuPtr should be Ptr in cudnnGetDropoutDescriptor (#397)
- CUDA throws OOM error when initializing API on multiple devices (#398)
- Cannot launch kernel with > 5 args using Dynamic Parallelism (#401)
- Reverse performance regression (#410)
- Tag for LLVM 3? (#412)
- CUDA not working (#415)
StatsBase.transform
fails onCuArray
(#426)- Further unification of
CUBLAS.axpy!
andLinearAlgebra.BLAS.axpy!
(#432) - size(range), length(range) and range[end] fail inside CUDA kernels (#434)
- InitError: Cannot use memory pool 'binned' when CUDA.jl was precompiled for memory pool 'split'. (#446)
- Missing dispatch for matrix multiplication with views? (#448)
- New version not available yet? (#452)
- using CUDA or CUArray, output: UndefVarError: AddrSpacePtr not defined (#457)
- Unable to upgrade to the latest version (#459)
Merged pull requests:
- Performance improvements by calling cuDNN API (#321) (@gartangh)
- Use ccall wrapper for correct pointer type conversions (#392) (@maleadt)
- Simplify Statistics.var and fix dims=tuple. (#393) (@maleadt)
- Adapt to GPUArrays test change. (#394) (@maleadt)
- Default to per-thread stream semantics (#395) (@maleadt)
- Add a missing context argument for stateless codegen. (#399) (@maleadt)
- Keep track of package latency timings. (#400) (@maleadt)
- Update manifest (#402) (@github-actions[bot])
- Latency improvements (#403) (@maleadt)
- Fix bounds checking with GPU views. (#404) (@maleadt)
- Force specialization for dynamic_cudacall to support more arguments. (#407) (@maleadt)
- Fix some wrong pointer types in the CUDNN headers. (#408) (@maleadt)
- Refactor CUSPARSE (#409) (@maleadt)
- Fix typo (#411) (@yixingfu)
- Update manifest (#413) (@github-actions[bot])
- Simplify library wrappers by introducing a CUDA Ref (#414) (@maleadt)
- Simplify and update wrappers (#416) (@maleadt)
- GEMM improvements (#417) (@maleadt)
- CompatHelper: add new compat entry for "BFloat16s" at version "0.1" (#418) (@github-actions[bot])
- add CuSparseMatrixCOO (#421) (@marius311)
- Update manifest (#423) (@github-actions[bot])
- Global math mode for easy use of lower-precision functionality (#424) (@maleadt)
- Improve init error message (#425) (@maleadt)
- CUBLAS: wrap rot! to implement rotate! and reflect! (#427) (@maleadt)
- CUFFT-related optimizations (#428) (@maleadt)
- Fix reverse/view regression (#429) (@maleadt)
- Update packages (#433) (@maleadt)
- Introduce StridedCuArray (#435) (@maleadt)
- Retry curandGenerateSeeds when OOM. (#436) (@maleadt)
- Introduce DenseCuArray union (#437) (@maleadt)
- Array simplifications (#438) (@maleadt)
- Fix and test reverse on wrapped array. (#439) (@maleadt)
- Fixes after recent array wrapper changes (#441) (@maleadt)
- Adapt to GPUArrays changes. (#442) (@maleadt)
- Provide CUBLAS with a pool-backed workspace. (#443) (@maleadt)
- Fix finalization of copied arrays. (#444) (@maleadt)
- Support for/Add CUDA 11.1 (#445) (@maleadt)
- Update manifest (#449) (@github-actions[bot])
- Allow use of strided vectors with mul! (gemv! and gemm!) (#450) (@maleadt)
- Have convert call CuSparseArray's constructors. (#451) (@maleadt)