Perf

clFFT python performance scripts

While it is convenient to be able to time a particular function with a given set of parameters, it is even better to be able to generate a plot of performance over a range of parameters. clFFT can generate performance plots with the help of Python scripts. The python scripts are located at ./src/scripts/perf, but when the INSTALL target is built from the build environment the scripts are copied into the ./bin/clFFT/develop/vs10x64/package directory along with the rest of the built binaries.

The are two primary python scripts that are user interact-able.

measurePerformance.py

This script is responsible for measuring, gathering performance data and recording it in a log file. This script calls the clFFT client program in a loop, modifying program parameters in an organized fashion and scrapes stdOut for performance information. It provides a sophisticated interface that simplifies specifying test ranges and strides. It provides for extensive help information with the --help parameter

C:\clFFT\src\scripts\perf>measurePerformance.py -h
usage: measurePerformance.py [-h] [--device DEVICE] [-b BATCHSIZE]
                                      [-a CONSTPROBSIZE] [-x LENGTHX]
                                      [-y LENGTHY] [-z LENGTHZ]
                                      [--problemsize PROBLEMSIZE]
                                      [-i INPUTLAYOUT] [-o OUTPUTLAYOUT]
                                      [-p PLACENESS] [-r PRECISION]
                                      [--ldscomplex]
                                      [--ldsfraction LDSFRACTION]
                                      [--cachesize CACHESIZE]
                                      [--xfactor XFACTOR]
                                      [--library {clfft}] [--label LABEL]
                                      [--createini CREATEINIFILENAME]
                                      [--ini INIFILENAME]
                                      [--tablefile TABLEOUTPUTFILENAME]

Measure performance of the clFFT library

optional arguments:
  -h, --help            show this help message and exit
  --device DEVICE       device(s) to run on; may be a comma-delimited list.
                        choices are ['gpu', 'cpu']. (default gpu)
  -b BATCHSIZE, --batchsize BATCHSIZE
                        number of FFTs to perform with one invocation of the
                        client. the special value 'max' may be used to adjust
                        the batch size on a per-transform basis to the maximum
                        problem size possible on the device. may be a range or
                        a comma-delimited list. if a range is entered, you may
                        follow it with ':X', where X is the stepping of the
                        range (if omitted, it defaults to a stepping of 1).
                        e.g., 1-15 or 12,18 or 7,10-30:10,1050-1054. the
                        special value 'pow10' expands to '1-9,10-90:10,100-900
                        :100,1000-9000:1000,10000-90000:10000,100000-900000:10
                        0000,1000000-9000000:1000000'. Note that 'max' and
                        'pow10' may not be used in a list; they must be used
                        by themselves; max may only be used with --library
                        clfft. (default 1)
  -a CONSTPROBSIZE, --adaptivemax CONSTPROBSIZE
                        Max problem size that you want to maintain across the
                        invocations of client with different lengths. This is
                        adaptive and adjusts itself automtically.
  -x LENGTHX, --lengthx LENGTHX
                        length(s) of x to test; must be factors of 1, 2, 3, or
                        5 with clFft; may be a range or a comma-delimited
                        list. e.g., 16-128 or 1200 or 16,2048-32768 (default
                        1)
  -y LENGTHY, --lengthy LENGTHY
                        length(s) of y to test; must be factors of 1, 2, 3, or
                        5 with clFft; may be a range or a comma-delimited
                        list. e.g., 16-128 or 1200 or 16,32768 (default 1)
  -z LENGTHZ, --lengthz LENGTHZ
                        length(s) of z to test; must be factors of 1, 2, 3, or
                        5 with clFft; may be a range or a comma-delimited
                        list. e.g., 16-128 or 1200 or 16,32768 (default 1)
  --problemsize PROBLEMSIZE
                        additional problems of a set size. may be used in
                        addition to lengthx/y/z. each indicated problem size
                        will be added to the list of FFTs to perform. should
                        be entered in AxBxC:D format. A, B, and C indicate the
                        sizes of the X, Y, and Z dimensions (respectively). D
                        is the batch size. All values except the length of X
                        are optional. may enter multiple in a comma-delimited
                        list. e.g., 2x2x2:32768 or 256x256:100,512x512:256
  -i INPUTLAYOUT, --inputlayout INPUTLAYOUT
                        may enter multiple in a comma-delimited list. choices
                        are ['cp', 'ci']. ci = complex interleaved, cp =
                        complex planar (default ci)
  -o OUTPUTLAYOUT, --outputlayout OUTPUTLAYOUT
                        may enter multiple in a comma-delimited list. choices
                        are ['cp', 'ci']. ci = complex interleaved, cp =
                        complex planar (default ci)
  -p PLACENESS, --placeness PLACENESS
                        may enter multiple in a comma-delimited list. choices
                        are ['in', 'out']. in = in place, out = out of place
                        (default in)
  -r PRECISION, --precision PRECISION
                        may enter multiple in a comma-delimited list. choices
                        are ['single', 'double']. (default single)
  --ldscomplex          turn on complex LDS (default off)
  --ldsfraction LDSFRACTION
                        fraction of the LDS to use; should be 0 or an integer
                        2-8. library automatically chooses the value on 0. may
                        be a range or a comma-delimited list. (default 0)
  --cachesize CACHESIZE
                        size of the cache; should be 0 or a positive integer
                        between one and two times the problem size. library
                        automatically chooses the value on a 0. may be a range
                        or a comma-delimited list. (default 0)
  --xfactor XFACTOR     size of the X dimension to use when dividing up large
                        problems; should be 0 or a power of 2. library
                        automatically chooses the value on a 0. may be a range
                        or a comma-delimited list. (default 0)
  --library {clfft}  indicates the library to use for testing on this run
  --label LABEL         a label to be associated with all transforms performed
                        in this run. if LABEL includes any spaces, it must be
                        in "double quotes". note that the label is not saved
                        to an .ini file. e.g., --label cayman may indicate
                        that a test was performed on a cayman card or --label
                        "Windows 32" may indicate that the test was performed
                        on Windows 32
  --createini CREATEINIFILENAME
                        create an .ini file with the given name that saves the
                        other parameters given at the command line, then quit.
                        e.g., 'performance.py -x 2048 --createini
                        my_favorite_setup.ini' will create an .ini file that
                        will save the configuration for a 2048-datapoint 1D
                        FFT.
  --ini INIFILENAME     use the parameters in the named .ini file instead of
                        the command line parameters.
  --tablefile TABLEOUTPUTFILENAME
                        save the results to a plaintext table with the file
                        name indicated. this can be used with
                        plotPerformance.py to generate graphs of the
                        data (default: table prints to screen)

An example of using this script to gather data is illustrated below; running to gather performance number for a few sizes - 4,16,64,256,1024.

C:\clFFT\src\scripts\perf>measurePerformance.py -x 4,16,64,256,1024 -b max
A subdirectory or file perfLog already exists.
=========================MEASURE PERFORMANCE START===========================
Process id of Measure Performance:14592
Executing measure performance for label: None
Executing for label: None
table header---->lengthx,lengthy,lengthz,batch,device,inlay,outlay,place,precision,label,GFLOPS
Total combinations =  5

preparing command: 1
Executing Command: ['Client.exe', '--gpu', '-x', '4', '-y', '1', '-z', '1', '--batchSize', '1048576', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10']
stdout:

========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 0 samples out of 10

===========================clFFT============================
        Handle:                   1
        Kernel:    0000000003DD08C0
     OutEvents:    000000000480F390
        Length:                 (4)
         Batch:             1048576
  Input Stride:                 (1)
 Output Stride:                 (1)
   Global Work:           (2097152)
        Gflops:                       83.3251
     Time (ns):                                      503,366


stderr:

Execution Successfull---------------


preparing command: 2
Executing Command: ['Client.exe', '--gpu', '-x', '16', '-y', '1', '-z', '1', '--batchSize', '262144', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10']
stdout:

========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 1 samples out of 10

===========================clFFT============================
        Handle:                   1
        Kernel:    0000000003DD0940
     OutEvents:    000000000627B6B0
        Length:                (16)
         Batch:              262144
  Input Stride:                 (1)
 Output Stride:                 (1)
   Global Work:           (1048576)
        Gflops:                       174.583
     Time (ns):                                      480,493


stderr:

Execution Successfull---------------


preparing command: 3
Executing Command: ['Client.exe', '--gpu', '-x', '64', '-y', '1', '-z', '1', '--batchSize', '65536', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10']
stdout:

========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 1 samples out of 10

===========================clFFT============================
        Handle:                   1
        Kernel:    0000000003DDCA00
     OutEvents:    0000000004DBFE50
        Length:                (64)
         Batch:               65536
  Input Stride:                 (1)
 Output Stride:                 (1)
   Global Work:           (1048576)
        Gflops:                       235.951
     Time (ns):                                      533,284


stderr:

Execution Successfull---------------


preparing command: 4
Executing Command: ['Client.exe', '--gpu', '-x', '256', '-y', '1', '-z', '1', '--batchSize', '16384', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10']
stdout:

========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 1 samples out of 10

===========================clFFT============================
        Handle:                   1
        Kernel:    0000000003EDC8D0
     OutEvents:    0000000004C18E30
        Length:               (256)
         Batch:               16384
  Input Stride:                 (1)
 Output Stride:                 (1)
   Global Work:           (1048576)
        Gflops:                       343.413
     Time (ns):                                      488,543


stderr:

Execution Successfull---------------


preparing command: 5
Executing Command: ['Client.exe', '--gpu', '-x', '1024', '-y', '1', '-z', '1', '--batchSize', '4096', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10']
stdout:

========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 0 samples out of 10

===========================clFFT============================
        Handle:                   1
        Kernel:    0000000003C508C0
     OutEvents:    000000000621C200
        Length:              (1024)
         Batch:                4096
  Input Stride:                 (1)
 Output Stride:                 (1)
   Global Work:            (524288)
        Gflops:                       420.946
     Time (ns):                                      498,200


stderr:

Execution Successfull---------------

=========================MEASURE PERFORMANCE ENDS===========================

This generates a log file in the current directory that contains the details of the parameters tested with the performance number

C:\clFFT\src\scripts\perf>type results2013-07-23T16.01.52.791000.txt
lengthx,lengthy,lengthz,batch,device,inlay,outlay,place,precision,label,GFLOPS
4,1,1,1048576,gpu,ci,ci,in,single,None,83.3251
16,1,1,262144,gpu,ci,ci,in,single,None,174.583
64,1,1,65536,gpu,ci,ci,in,single,None,235.951
256,1,1,16384,gpu,ci,ci,in,single,None,343.413
1024,1,1,4096,gpu,ci,ci,in,single,None,420.946

This log file is then fed into the plotPerformance.py script, which consumes the records and plots the results in a graph.

plotPerformance.py

While the logfile generated from measurePerformance is sufficient for gathering performance data, it is nice to be able to generate plots with the data to be able to easily compare and contrast different sets of data. This is the purpose of plotPerformance.py; this python script uses the python matplotlib ( freely available ) library to either open a window into an interactive graph, or create an image file straight to disk. It provides for extensive help information with the --help parameter

C:\clFFT\src\scripts\perf>plotPerformance.py -h
usage: plotPerformance.py [-h] -d DATAFILE -x
                                   {x,y,z,batchsize,problemsize} [-y {gflops}]
                                   [--plot {device,precision,label}]
                                   [--title GRAPHTITLE]
                                   [--x_axis_label XAXISLABEL]
                                   [--x_axis_scale {linear,log2,log10}]
                                   [--y_axis_label YAXISLABEL]
                                   [--outputfile OUTPUTFILENAME]

Plot performance of the clFFT library. plotPerformance.py reads in
data tables from  measurePerformance.py and plots their values

optional arguments:
  -h, --help            show this help message and exit
  -d DATAFILE, --datafile DATAFILE
                        indicate a file to use as input. must be in the format
                        output by measurePerformance.py. may be used
                        multiple times to indicate multiple input files. e.g.,
                        -d cypressOutput.txt -d caymanOutput.txt
  -x {x,y,z,batchsize,problemsize}, --x_axis {x,y,z,batchsize,problemsize}
                        indicate which value will be represented on the x
                        axis. problemsize is defined as x*y*z*batchsize
  -y {gflops}, --y_axis {gflops}
                        indicate which value will be represented on the y axis
  --plot {device,precision,label}
                        indicate which of ['device', 'precision', 'label']
                        should be used to differentiate multiple plots. this
                        will be chosen automatically if not specified
  --title GRAPHTITLE    the desired title for the graph generated by this
                        execution. if GRAPHTITLE contains any spaces, it must
                        be entered in "double quotes". if this option is not
                        specified, the title will be autogenerated
  --x_axis_label XAXISLABEL
                        the desired label for the graph's x-axis. if
                        XAXISLABEL contains any spaces, it must be entered in
                        "double quotes". if this option is not specified, the
                        x-axis label will be autogenerated
  --x_axis_scale {linear,log2,log10}
                        the desired scale for the graph's x-axis. if nothing
                        is specified, it will be selected automatically
  --y_axis_label YAXISLABEL
                        the desired label for the graph's y-axis. if
                        YAXISLABEL contains any spaces, it must be entered in
                        "double quotes". if this option is not specified, the
                        y-axis label will be autogenerated
  --outputfile OUTPUTFILENAME
                        name of the file to output graphs. Supported formats:
                        emf, eps, pdf, png, ps, raw, rgba, svg, svgz.

Once the performance of a particular run has been saved to a log file, you can instruct clAmdBlas.plotPerformance to parse the log file and create a line graph from that data. The graph below shows the performance over the data points measured.

C:\clFFT\src\scripts\perf>plotPerformance.py  -x x -d results2013-07-23T16.01.52.791000.txt

FFT sample performance plot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf

clFFT python performance scripts

measurePerformance.py

plotPerformance.py

Clone this wiki locally