Releases: CNugteren/CLBlast
Releases · CNugteren/CLBlast
Preview version 0.7.0
Version 0.7.0
- Added exports to be able to create a DLL on Windows (thanks to Marco Hutter)
- Made the library thread-safe
- Performance and correctness tests can now (on top of clBLAS) be performed against CPU BLAS libraries
- Fixed the use of events within the library
- Changed the enum parameters to match the raw values of the cblas standard
- Fixed the cache of previously compiled binaries and added a function to fill or clear it
- Various minor fixes and enhancements
- Added a preliminary version of the API documentation
- Added additional sample programs
- Added tuned parameters for various devices (see README)
- Added level-1 routines:
- SNRM2/DNRM2/ScNRM2/DzNRM2
- SASUM/DASUM/ScASUM/DzASUM
- SSUM/DSUM/ScSUM/DzSUM (non-absolute version of the above xASUM BLAS routines)
- iSAMAX/iDAMAX/iCAMAX/iZAMAX
- iSMAX/iDMAX/iCMAX/iZMAX (non-absolute version of the above ixAMAX BLAS routines)
- iSMIN/iDMIN/iCMIN/iZMIN (non-absolute minimum version of the above ixAMAX BLAS routines)
Note:
Binary releases are experimental, build from source code if possible.
Preview version 0.6.0
Version 0.6.0
- Added support for MSVC (Visual Studio) 2015
- Added tuned parameters for various devices (see README)
- Now automatically generates C++ code from JSON tuning results
- Added level-2 routines:
- SGER/DGER
- CGERU/ZGERU
- CGERC/ZGERC
- CHER/ZHER
- CHPR/ZHPR
- CHER2/ZHER2
- CHPR2/ZHPR2
- CSYR/ZSYR
- CSPR/ZSPR
- CSYR2/ZSYR2
- CSPR2/ZSPR2
Preview version 0.5.0
Version 0.5.0
- Improved structure and performance of level-2 routines (xSYMV/xHEMV)
- Reduced compilation time of level-3 OpenCL kernels
- Added level-1 routines:
- SSWAP/DSWAP/CSWAP/ZSWAP
- SSCAL/DSCAL/CSCAL/ZSCAL
- SCOPY/DCOPY/CCOPY/ZCOPY
- SDOT/DDOT
- CDOTU/ZDOTU
- CDOTC/ZDOTC
- Added level-2 routines:
- SGBMV/DGBMV/CGBMV/ZGBMV
- CHBMV/ZHBMV
- CHPMV/ZHPMV
- SSBMV/DSBMV
- SSPMV/DSPMV
- STRMV/DTRMV/CTRMV/ZTRMV
- STBMV/DTBMV/CTBMV/ZTBMV
- STPMV/DTPMV/CTPMV/ZTPMV
Preview version 0.4.0
Version 0.4.0
- Now using the Claduc C++11 interface to OpenCL
- Added plain C API for increased compatibility (clblast_c.h)
- Re-organized tuner infrastructure and added JSON output
- Removed clBLAS sources, it should now be installed separately for testing
- Added Travis continuous integration
- Added level-2 routines:
- CHEMV/ZHEMV
- SSYMV/DSYMV
Preview version 0.3.0
Version 0.3.0
- Re-organized test/client infrastructure to avoid code duplication
- Added an optional bypass for pre/post-processing kernels in level-3 routines
- Significantly improved performance of level-3 routines on AMD GPUs
- Added level-3 routines:
- CHEMM/ZHEMM
- SSYRK/DSYRK/CSYRK/ZSYRK
- CHERK/ZHERK
- SSYR2K/DSYR2K/CSYR2K/ZSYR2K
- CHER2K/ZHER2K
- STRMM/DTRMM/CTRMM/ZTRMM
Preview version 0.2.0
Version 0.2.0
- Added support for complex conjugate transpose
- Several host-code performance improvements
- Improved testing infrastructure and coverage
- Added level-2 routines:
- SGEMV/DGEMV/CGEMV/ZGEMV
- Added level-3 routines:
- CGEMM/ZGEMM
- CSYMM/ZSYMM
Preview version 0.1.0
Version 0.1.0
- Initial preview version release to GitHub
- Supported level-1 routines:
- SAXPY/DAXPY/CAXPY/ZAXPY
- Supported level-3 routines:
- SGEMM/DGEMM
- SSYMM/DSYMM