This repository provides a header-only library to compute fourier transforms in 1D, 2D, and 3D. Its goal is to provide a fast and easy-to-use fast fourier transform algorithm.
Clone the repository and all its submodules using the following command:
git clone --recursive git@github.com:jdupuy/dj_fft.git
If you accidentally omitted the --recursive
flag when cloning the repository you can retrieve the submodules like so:
git submodule update --init --recursive
The 1D, 2D, and 3D FFT routines return an std::vector<std::complex<T>>
, given another std::vector<std::complex<T>>
as input, which specifies the data that must be transformed, as well as an enum class dj::fft_dir
, which specifies in which direction the FFT must be computed (specify dj::fft_dir::DIR_FWD
for the forward direction and dj::fft_dir::DIR_BWD
for the backward direction).
Note that the input vector is expected to be of size N
for 1D FFT, NxN
for a 2D FFT, and NxNxN
for a 3D FFT, where N
must be a power of two. Note that the 2D and 3D vectors are expected to be arranged in a flat row-major fashion, i.e., the 2D and 3D elements (i, j)
and (i, j, k)
are respectively located at index i + N * j
and i + N * (j + N * k)
in memory.
Below is a C++ pseudocode for computing a 2D FFT in forward direction:
#define DJ_FFT_IMPLEMENTATION // define this in exactly *one* .cpp file
#include "dj_fft.h"
some_function()
{
int N = size_of_your_input; // input size
auto myData = std::vector<std::complex<T>>(N * N); // input data
// prepare data
for (int j = 0; j < N; ++j) {
for (int i = 0; i < N; ++i) {
myData[i + N * j] = some_value; // set element (i, j)
}
}
// compute forward 2D FFT
auto fftData = dj::fft2d(myData, dj::fft_dir::DIR_FWD);
// print the data
for (int j = 0; j < N; ++j) {
for (int i = 0; i < N; ++i) {
printf("{%f, %f} ", fftData[i + N * j].real(), fftData[i + N * j].imag());
}
printf("\n");
}
}
To see examples that compile, see the examples/ directory.
Additionally, the library provides GPU accelerated 1D, 2D, and 3D FFTs for std::vector<std::complex<float>>
inputs. GPU acceleration is especially relevant for large 2D and 3D datasets. For instance:
- for an input of size 4096x4096, a regular 2D FFT completes in roughly 18 seconds on an intel i7-8086k, and 0.9 seconds on an NVidia RTX 2080
- for an input of size 512x512x512, a regular 3D FFT completes in roughly 131 seconds on an intel i7-8086k, and 8.2 seconds on an NVidia RTX 2080
The following table provides a more comprehensive set of measurements for 2D FFTs:
2D FFT Resolution | 256² | 512² | 1024² | 2048² | 4096² | 8192² |
---|---|---|---|---|---|---|
CPU (i7-8086k) | 0.05s | 0.22s | 0.99s | 4.32s | 18.85s | 81.96s |
GPU (RTX 2080) | 0.01s | 0.02s | 0.07s | 0.24s | 0.94s | 3.68s |
GPU speed-up | x5 | x11 | x14 | x18 | x20 | x22 |
The following table provides a more comprehensive set of measurements for 3D FFTs:
3D FFT Resolution | 64³ | 128³ | 256³ | 512³ |
---|---|---|---|---|
CPU (i7-8086k) | 0.19s | 1.72s | 15.70s | 141.18s |
GPU (RTX 2080) | 0.04s | 0.15s | 1.03s | 8.10s |
GPU speed-up | x5 | x11 | x15 | x17 |
Below is a C++ pseudocode for computing a 1D FFT in backward direction on the GPU:
#define DJ_FFT_IMPLEMENTATION // define this in exactly *one* .cpp file
#include "dj_fft.h"
some_function()
{
int N = size_of_your_input; // input size
auto myData = std::vector<std::complex<float>>(N); // input data
// prepare data
for (int i = 0; i < N; ++i) {
myData[i] = some_float_value; // set element (i)
}
// compute backward 1D FFT
auto fftData = dj::fft1d_gpu(myData, dj::fft_dir::FFT_BWD);
// print the data
for (int i = 0; i < N; ++i) {
printf("{%f, %f}\n", fftData[i].real(), fftData[i].imag());
}
}
Note that the return values of a GPU FFT may differ slightly from that of a regular FFT, due to the way floating point arithmetic is implemented.
For a complete example that compiles, see the examples/ directory.
By default, the GPU accelerated routines run on the primary GPU. Users who want to run the FFT on a secondary GPU will have to create an OpenGL context themselves and use the fftNd_gpu_glready
functions. You can create a custom OpenGL context with a cross-platform windowing library such as GLFW (https://www.glfw.org/), and an OpenGL function loader such as glad (https://glad.dav1d.de/). I'll probably add a sample at some point.
This library is in the public domain. You can do anything you want with them. You have no legal obligation to do anything else, although I appreciate attribution.
It is also licensed under the MIT open source license, if you have lawyers who are unhappy with public domain. The dj_fft.h
source file includes an explicit dual-license for you to choose from.