-
Notifications
You must be signed in to change notification settings - Fork 192
Client
The clFFT client program comes with the clFFT library package. This program is more than just a sample application demonstrating the use of FFT library. For a simple example code, visit the home page of clFFT. The client program supports various capabilities including performance measurement. In general, the client program can invoke a user specified type of FFT transform and perform an FFT impulse test. In that sense, it has been designed to do a simple verification of a particular kind of FFT transform. The following features are supported by the client program.
- Ability to specify precision of transform
- Ability to specifiy lengths and dimensions
- Ability to select forward or backward transform
- Ability to choose buffer layouts
- Ability to input strides and distances
- Ability to specify number of transforms
- Ability to dump underlying OpenCL kernels
- Ability to measure performance for a specified transform
The block below shows the help message given by the client program listing all the command line options. These options can be used to input various parameters and control the type of FFT.
C:\clFFT\bin\staging\Debug>Client.exe -h
clFFT client command line options:
-h [ --help ] produces this help message
-v [ --version ] Print queryable version information from the
clFFT library
-i [ --clInfo ] Print queryable information of the OpenCL runtime
-g [ --gpu ] Force instantiation of an OpenCL GPU device
-c [ --cpu ] Force instantiation of an OpenCL CPU device
-a [ --all ] Force instantiation of all OpenCL devices
-o [ --outPlace ] Out of place FFT transform (default: in place)
--double Double precision transform (default: single)
--inv Backward transform (default: forward)
-d [ --dumpKernels ] FFT engine will dump generated OpenCL FFT kernels
to disk (default: dump off)
-x [ --lenX ] arg (=1024) Specify the length of the 1st dimension of a test
array
-y [ --lenY ] arg (=1) Specify the length of the 2nd dimension of a test
array
-z [ --lenZ ] arg (=1) Specify the length of the 3rd dimension of a test
array
--isX arg (=1) Specify the input stride of the 1st dimension of
a test array
--isY arg (=0) Specify the input stride of the 2nd dimension of
a test array
--isZ arg (=0) Specify the input stride of the 3rd dimension of
a test array
--iD arg (=0) input distance between subsequent sets of data
when batch size > 1
--osX arg (=1) Specify the output stride of the 1st dimension of
a test array
--osY arg (=0) Specify the output stride of the 2nd dimension of
a test array
--osZ arg (=0) Specify the output stride of the 3rd dimension of
a test array
--oD arg (=0) output distance between subsequent sets of data
when batch size > 1
-b [ --batchSize ] arg (=1) If this value is greater than one, arrays will be
used
-p [ --profile ] arg (=1) Time and report the kernel speed of the FFT
(default: profiling off)
--inLayout arg (=1) Layout of input data:
1) interleaved
2) planar
3) hermitian interleaved
4) hermitian planar
5) real
--outLayout arg (=1) Layout of input data:
1) interleaved
2) planar
3) hermitian interleaved
4) hermitian planar
5) real
--xFactor arg (=0) set the size of X dimension if a large 1D dataset
needs to be broken down (default: library
automatically chooses factorization)
--ldsComplex LDS is complex (default: false)
--ldsPadding Data is padding in LDS (default: false)
--ldsFraction arg (=0) specify the LDS fraction (default: library
automatically chooses LDS fraction)
--cacheSize arg (=0) specify the cahce size (default: library
automatically chooses cache size)
Some examples are shown below. First example is invoking a tranform of length 16. All other values are at their defaults.
C:\clFFT\bin\staging\Debug>Client.exe -x 16
Client Test *****PASS*****
Next example shows a 2D double precision transform with size 50x100.
C:\clFFT\bin\staging\Debug>Client.exe -x 50 -y 100 --double
Client Test *****PASS*****
Next example shows a 1D transform with input & output having buffer layouts. The strides are 2 for input and 3 for output. The length of tranform is 1024.
C:\clFFT\bin\staging\Debug>Client.exe -x 1024 --inLayout 2 --outLayout 2 --isX 2 --osX 3
Client Test *****PASS*****
Next example shows a 2D real transform with Hermitian interleaved output. The size is set at 192x108.
C:\clFFT\bin\staging\Debug>Client.exe -x 192 -y 108 --inLayout 5 --outLayout 3
Client Test *****PASS*****
Next example shows how to measure performance for a 1D 512-size tranform with batch set to 100. The profile parameter specifies the number of iterations to run and prune the timing results. Since the GPU device becomes more efficient as the data size grows, you would want to set batch and transform size at high values, as allowed by the device memory limits, to see maximum attainable performance.
In this example, the Glops is reported as 88. It is calculated using the elapsed time and standard FFT performance formula ( 5nlog(n) / t ). The time in nanoseconds is also reported.
C:\clFFT\bin\staging\Debug>Client.exe -x 512 -b 100 -p 50
========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 1 samples out of 50
===========================clFFT============================
Handle: 1
Kernel: 0000000003CA0710
OutEvents: 0000000003C86E70
Length: (512)
Batch: 100
Input Stride: (1)
Output Stride: (1)
Global Work: (6400)
Gflops: 88.492
Time (ns): 26,036
Next example shows how to measure performance for a double precision 2D 128x128 transform. In this example, you see performance being reported for 5 plan handles. The last one is the overall performance for the transform. Since the 2D involves 4 operations, a row transform followed by a transpose and then a column transform followed by a transpose, all the individual operations are timed and reported.
C:\clFFT\bin\staging\Debug>clAmdFft.Client.exe -x 512 -y 512 --double -p 50
========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 0 samples out of 50
clFFT[ 1 ]: Pruning 0 samples out of 50
clFFT[ 2 ]: Pruning 1 samples out of 50
clFFT[ 3 ]: Pruning 0 samples out of 50
clFFT[ 4 ]: Pruning 0 samples out of 50
===========================clFFT============================
Handle: 2
Kernel: 0000000003BBC730
OutEvents: 0000000004A37640
Length: (512,512)
Input Stride: (1,512)
Output Stride: (1,512)
Global Work: (32768)
Gflops: 125.589
Time (ns): 93,928
Handle: 3
Kernel: 0000000003BBC810
OutEvents: 000000000640E410
Length: (512,512)
Input Stride: (1,512)
Output Stride: (1,512)
Global Work: (8704)
Gflops: 132.17
Time (ns): 178,504
Handle: 4
Kernel: 0000000003BBCA40
OutEvents: 0000000003C7DE90
Length: (512,512)
Input Stride: (1,512)
Output Stride: (1,512)
Global Work: (32768)
Gflops: 127.419
Time (ns): 92,580
Handle: 5
Kernel: 0000000003BBC810
OutEvents: 00000000049FFCB0
Length: (512,512)
Input Stride: (1,512)
Output Stride: (1,512)
Global Work: (8704)
Gflops: 132.637
Time (ns): 177,875
Handle: 1
Child Handles: (2,3,4,5)
Length: (512,512)
Input Stride: (1,512)
Output Stride: (1,512)
Gflops: 43.4581
Time (ns): 542,889