Skip to content

Commit

Permalink
Merge pull request #29 from CNugteren/development
Browse files Browse the repository at this point in the history
Update to version 0.5.0
  • Loading branch information
CNugteren committed Oct 17, 2015
2 parents a41744d + 9240403 commit 4678fd3
Show file tree
Hide file tree
Showing 194 changed files with 12,709 additions and 1,513 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
build
stash
.*
.*
*.pyc
4 changes: 4 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,9 @@ before_script:
script:
- make
- make install
branches:
only:
- master
- development
notifications:
email: false
20 changes: 20 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,4 +1,24 @@

Version 0.5.0
- Improved structure and performance of level-2 routines (xSYMV/xHEMV)
- Reduced compilation time of level-3 OpenCL kernels
- Added level-1 routines:
* SSWAP/DSWAP/CSWAP/ZSWAP
* SSCAL/DSCAL/CSCAL/ZSCAL
* SCOPY/DCOPY/CCOPY/ZCOPY
* SDOT/DDOT
* CDOTU/ZDOTU
* CDOTC/ZDOTC
- Added level-2 routines:
* SGBMV/DGBMV/CGBMV/ZGBMV
* CHBMV/ZHBMV
* CHPMV/ZHPMV
* SSBMV/DSBMV
* SSPMV/DSPMV
* STRMV/DTRMV/CTRMV/ZTRMV
* STBMV/DTBMV/CTBMV/ZTBMV
* STPMV/DTPMV/CTPMV/ZTPMV

Version 0.4.0
- Now using the Claduc C++11 interface to OpenCL
- Added plain C API for increased compatibility (clblast_c.h)
Expand Down
8 changes: 4 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
cmake_minimum_required(VERSION 2.8.10)
project("clblast" C CXX)
set(clblast_VERSION_MAJOR 0)
set(clblast_VERSION_MINOR 4)
set(clblast_VERSION_MINOR 5)
set(clblast_VERSION_PATCH 0)

# Options and their default values
Expand Down Expand Up @@ -102,11 +102,11 @@ include_directories(${clblast_SOURCE_DIR}/include ${OPENCL_INCLUDE_DIRS})
# ==================================================================================================

# Sets the supported routines and the used kernels. New routines and kernels should be added here.
set(KERNELS copy pad transpose padtranspose xaxpy xgemv xgemm)
set(KERNELS copy pad transpose padtranspose xaxpy xdot xgemv xgemm)
set(SAMPLE_PROGRAMS_CPP sgemm)
set(SAMPLE_PROGRAMS_C sgemm)
set(LEVEL1_ROUTINES xaxpy)
set(LEVEL2_ROUTINES xgemv xhemv xsymv)
set(LEVEL1_ROUTINES xswap xscal xcopy xaxpy xdot xdotu xdotc)
set(LEVEL2_ROUTINES xgemv xgbmv xhemv xhbmv xhpmv xsymv xsbmv xspmv xtrmv xtbmv xtpmv)
set(LEVEL3_ROUTINES xgemm xsymm xhemm xsyrk xherk xsyr2k xher2k xtrmm)
set(ROUTINES ${LEVEL1_ROUTINES} ${LEVEL2_ROUTINES} ${LEVEL3_ROUTINES})
set(PRECISIONS 32 3232 64 6464)
Expand Down
33 changes: 16 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ CLBlast: The tuned OpenCL BLAS library

CLBlast is a modern, lightweight, performant and tunable OpenCL BLAS library written in C++11. It is designed to leverage the full performance potential of a wide variety of OpenCL devices from different vendors, including desktop and laptop GPUs, embedded GPUs, and other accelerators. CLBlast implements BLAS routines: basic linear algebra subprograms operating on vectors and matrices.

__Note that the CLBlast library is actively being developed, and is not mature enough for production environments__. This preview-version doesn't support all routines yet: others will be added in due time. It also lacks extensive tuning on some common OpenCL platforms: __out-of-the-box performance on some devices might be poor__. See below for more details.
__Note that the CLBlast library is actively being developed, and is not mature enough for production environments__. This preview-version doesn't support the less commonly used routines yet: they will be added in due time. It also lacks extensive tuning on some common OpenCL platforms: __out-of-the-box performance on some devices might be poor__. See below for more details.


Why CLBlast and not clBLAS or cuBLAS?
Expand Down Expand Up @@ -130,22 +130,21 @@ These graphs can be generated automatically on your own device. First, compile C
Supported routines
-------------

CLBlast is in active development and currently does not support the full set of BLAS routines. The currently supported routines are marked with '✔' in the following tables:
CLBlast is in active development but already supports the majority of BLAS routines. The currently supported routines are marked with '✔' in the following tables:

| Level-1 | S | D | C | Z | Notes |
| ---------|---|---|---|---|---------|
| xROTG | | | - | - | |
| xROTMG | | | - | - | |
| xROT | | | - | - | |
| xROTM | | | - | - | |
| xSWAP | | | | | |
| xSCAL | | | | | +CS +ZD |
| xCOPY | | | | | |
| xSWAP | ||| | |
| xSCAL | ||| | +CS +ZD |
| xCOPY | ||| | |
| xAXPY ||||| |
| xDOT | | | - | - | +DS |
| xDOTU | - | - | | | |
| xDOTC | - | - | | | |
| xxxDOT | - | - | - | - | +SDS |
| xDOT ||| - | - | |
| xDOTU | - | - ||| |
| xDOTC | - | - ||| |
| xNRM2 | | | - | - | +SC +DZ |
| xASUM | | | - | - | +SC +DZ |
| IxAMAX | | | | | |
Expand All @@ -154,16 +153,16 @@ CLBlast is in active development and currently does not support the full set of
| Level-2 | S | D | C | Z | Notes |
| ---------|---|---|---|---|---------|
| xGEMV ||||| |
| xGBMV | | | | | |
| xGBMV | ||| | |
| xHEMV | - | - ||| |
| xHBMV | - | - | | | |
| xHPMV | - | - | | | |
| xHBMV | - | - | | | |
| xHPMV | - | - | | | |
| xSYMV ||| - | - | |
| xSBMV | | | - | - | |
| xSPMV | | | - | - | |
| xTRMV | | | | | |
| xTBMV | | | | | |
| xTPMV | | | | | |
| xSBMV | | | - | - | |
| xSPMV | | | - | - | |
| xTRMV | ||| | |
| xTBMV | ||| | |
| xTPMV | ||| | |
| xTRSV | | | | | |
| xTBSV | | | | | |
| xTPSV | | | | | |
Expand Down
Loading

0 comments on commit 4678fd3

Please sign in to comment.