Skip to content

Commit

Permalink
Speed up time stretching ~3x on Linux with FFTW3. (#349)
Browse files Browse the repository at this point in the history
* Add FFTW3 for faster time stretching.

* Fix build on Ubuntu.

* Lint.

* Add FMA4 support.

* Target Broadwell CPUs instead of just fma4.

* Add FMA4 support again.

* Enable optional AVX512 instructions.

* Add missing swizzle enum definitions.

* Don't built KCVI extensions with GCC or Clang.

* Add missing declares.

* Remove duplicate assert.c.

* Properly ignore assert.c on ARM.

* Disable AVX512.

* Don't compile any avx512 files.

* No more of these FMA4 instructions, please.

* Nope, no AVX_128_FMA either.

* No -mavx maybe?

* -march=native

* Tell RubberBand that we're using threads.

* Tell RubberBand that we're already configured.

* Silly; -DNO_THREADING=0 doesn't work, you need to not define NO_THREADING.

* Use Pthreads.

* Use PThreads, but correctly this time.

* Wrap RubberBandStretcher constructor with a mutex.

* It's... memalign?

* Also disable generic SIMD.

* Fix Windows build.

* Closer...

* So close, come on MSVC, you can do it!

* Use built-in FFT on Windows.

* Disable pthreads on Windows.

* Just use AVX; it's all we need!
  • Loading branch information
psobot authored Jul 2, 2024
1 parent f29daa9 commit 60404dd
Show file tree
Hide file tree
Showing 3,507 changed files with 491,202 additions and 23 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
1 change: 1 addition & 0 deletions .github/workflows/all.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ jobs:
uses: jidicula/clang-format-action@v4.13.0
with:
clang-format-version: 14
exclude-regex: 'vendors/'
fallback-style: LLVM

# Build the native module with ccache enabled so we can share object files between builds:
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ To cite via BibTeX:

- The core audio processing code is pulled from [JUCE 6](https://juce.com/), which is [dual-licensed under a commercial license and the GPLv3](https://juce.com/juce-6-licence).
- The [VST3 SDK](https://github.com/steinbergmedia/vst3sdk), bundled with JUCE, is owned by [Steinberg® Media Technologies GmbH](https://www.steinberg.net/en/home.html) and licensed under the GPLv3.
- The `PitchShift` plugin uses [the Rubber Band Library](https://github.com/breakfastquay/rubberband), which is [dual-licensed under a commercial license](https://breakfastquay.com/technology/license.html) and the GPLv2 (or newer).
- The `PitchShift` plugin and `time_stretch` functions use [the Rubber Band Library](https://github.com/breakfastquay/rubberband), which is [dual-licensed under a commercial license](https://breakfastquay.com/technology/license.html) and the GPLv2 (or newer). [FFTW](https://www.fftw.org/) is also included to speed up Rubber Band, and [is licensed under the GPLv2 (or newer)](https://www.fftw.org/doc/License-and-Copyright.html).
- The `MP3Compressor` plugin uses [libmp3lame from the LAME project](https://lame.sourceforge.io/), which is [licensed under the LGPLv2](https://github.com/lameproject/lame/blob/master/README) and [upgraded to the GPLv3 for inclusion in this project (as permitted by the LGPLv2)](https://www.gnu.org/licenses/gpl-faq.html#AllCompatibility).
- The `GSMFullRateCompressor` plugin uses [libgsm](http://quut.com/gsm/), which is [licensed under the ISC license](https://github.com/timothytylee/libgsm/blob/master/COPYRIGHT) and [compatible with the GPLv3](https://www.gnu.org/licenses/license-list.en.html#ISC).

Expand Down
9 changes: 4 additions & 5 deletions pedalboard/TimeStretch.h
Original file line number Diff line number Diff line change
Expand Up @@ -226,12 +226,14 @@ timeStretch(const juce::AudioBuffer<float> input, double sampleRate,
sampleRate, input.getNumChannels(), options, 1.0 / initialStretchFactor,
pow(2.0, (initialPitchShiftInSemitones / 12.0)));

rubberBandStretcher.setExpectedInputDuration(input.getNumSamples());

const float **inputChannelPointers =
(const float **)alloca(sizeof(float *) * input.getNumChannels());

size_t maximumBlockSize = rubberBandStretcher.getProcessSizeLimit();
if (!(options & RubberBandStretcher::OptionProcessRealTime)) {
rubberBandStretcher.setExpectedInputDuration(input.getNumSamples());
rubberBandStretcher.setMaxProcessSize(maximumBlockSize);

for (size_t i = 0; i < input.getNumSamples();
i += STUDY_BLOCK_SAMPLE_SIZE) {
size_t numSamples =
Expand All @@ -242,7 +244,6 @@ timeStretch(const juce::AudioBuffer<float> input, double sampleRate,
bool isLast = i + numSamples >= input.getNumSamples();
rubberBandStretcher.study(inputChannelPointers, numSamples, isLast);
}
rubberBandStretcher.setMaxProcessSize(input.getNumSamples());
}

juce::AudioBuffer<float> output(input.getNumChannels(),
Expand All @@ -254,8 +255,6 @@ timeStretch(const juce::AudioBuffer<float> input, double sampleRate,
/* keepExistingContent */ false, /* clearExtraSpace */ false,
/* avoidReallocating */ true);

size_t maximumBlockSize = rubberBandStretcher.getProcessSizeLimit();

float **outputChannelPointers =
(float **)alloca(sizeof(float *) * output.getNumChannels());

Expand Down
2 changes: 1 addition & 1 deletion pedalboard/_pedalboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def __repr__(self) -> str:
def strip_common_float_suffixes(
s: Union[float, str, bool], strip_si_prefixes: bool = True
) -> Union[float, str, bool]:
if not isinstance(s, str) or (hasattr(s, "type") and s.type != str): # type: ignore
if not isinstance(s, str) or (hasattr(s, "type") and s.type is not str): # type: ignore
return s

s = s.strip()
Expand Down
123 changes: 122 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,11 +100,130 @@
ALL_CPPFLAGS.extend(
[
"-DUSE_BQRESAMPLER=1",
"-DNO_THREADING=1",
"-D_HAS_STD_BYTE=0",
"-DNOMINMAX",
"-DALREADY_CONFIGURED",
]
)


def ignore_files_matching(files, *matches):
matches = set(matches)
for match in matches:
new_files = []
for file in files:
if match in str(file):
# print(f"Skipping compilation of: {file}")
pass
else:
new_files.append(file)
files = new_files
return files


# Platform-specific FFT speedup flags:
if platform.system() == "Windows":
ALL_CPPFLAGS.append("-DUSE_BUILTIN_FFT")
ALL_CPPFLAGS.append("-DNO_THREADING")
elif platform.system() == "Darwin":
# No need for any threading code on MacOS;
# vDSP does all of this for us and these code paths are redundant.
ALL_CPPFLAGS.append("-DNO_THREADING")
elif platform.system() == "Linux":
# Use FFTW3 for FFTs on Linux, which should speed up Rubberband by 3-4x:
ALL_CPPFLAGS.extend(
[
"-DHAVE_FFTW3=1",
"-DLACK_SINCOS=1",
"-DFFTW_DOUBLE_ONLY=1",
"-DUSE_PTHREADS",
]
)
ALL_INCLUDES += ["vendors/fftw3/api/", "vendors/fftw3/"]
fftw_paths = list(Path("vendors/fftw3/").glob("**/*.c"))
fftw_paths = ignore_files_matching(
fftw_paths,
# Don't bother compiling in Altivec or VSX (PowerPC) support;
# it's 2024, not 2004 (although RIP my G5 cheese grater)
"altivec",
"vsx",
# We're not using FFTW in multi-threaded mode:
"mpi",
"threads",
# No need for tests, tools, or support code:
"tests",
"tools",
"/support",
"common/",
"libbench",
# Ignore SSE, AVX2, AVX128, and AVX512 SIMD code;
# For Rubber Band's usage, just AVX gives us the
# largest speedup without bloating the binary
"sse2",
"avx2",
"avx512",
"kcvi",
"avx-128-fma",
"generic-simd",
)

# On ARM, ignore the X86-specific SIMD code:
if "arm" in platform.processor() or "aarch64" in platform.processor():
fftw_paths = ignore_files_matching(fftw_paths, "avx", "/sse")
ALL_CFLAGS.append("-DHAVE_NEON=1")
else:
# And on x86, ignore the ARM-specific SIMD code (and KCVI; not GCC or Clang compatible).
fftw_paths = ignore_files_matching(fftw_paths, "neon")
ALL_CFLAGS.append("-march=native")
# Enable SIMD instructions:
ALL_CFLAGS.extend(
[
# "-DHAVE_SSE2",
"-DHAVE_AVX", # Testing shows this is all we need!
# "-DHAVE_AVX_128_FMA", # AMD only
# "-DHAVE_AVX2",
# "-DHAVE_AVX512", # No measurable speed difference
# "-DHAVE_GENERIC_SIMD128", # Crashes!
# "-DHAVE_GENERIC_SIMD256", # Also crashes!
]
)

ALL_SOURCE_PATHS += fftw_paths

ALL_CFLAGS.extend(
[
"-DHAVE_UINTPTR_T",
'-DPACKAGE="FFTW"',
'-DVERSION="0"',
'-DPACKAGE_VERSION="00000"',
'-DFFTW_CC="clang"',
"-includestring.h",
"-includestdint.h",
"-includevendors/fftw3/dft/codelet-dft.h",
"-includevendors/fftw3/rdft/codelet-rdft.h",
"-DHAVE_INTTYPES_H",
"-DHAVE_STDINT_H",
"-DHAVE_STDLIB_H",
"-DHAVE_STRING_H",
"-DHAVE_TIME_H",
"-DHAVE_UNISTD_H",
"-DHAVE_DECL_DRAND48",
"-DHAVE_DECL_SRAND48",
"-DHAVE_DECL_COSL",
"-DHAVE_DECL_SINL",
"-DHAVE_DECL_POSIX_MEMALIGN",
"-DHAVE_DRAND48",
"-DHAVE_SRAND48",
"-DHAVE_POSIX_MEMALIGN",
"-DHAVE_ISNAN",
"-DHAVE_SNPRINTF",
"-DHAVE_STRCHR",
"-DHAVE_SYSCTL",
]
)
if platform.system() == "Linux":
ALL_CFLAGS.append("-DHAVE_GETTIMEOFDAY")

ALL_SOURCE_PATHS += list(Path("vendors/rubberband/single").glob("*.cpp"))

ALL_SOURCE_PATHS += list(Path("vendors").glob("*.c"))
Expand Down Expand Up @@ -142,13 +261,15 @@
ALL_LINK_ARGS.append("-flto=thin")
ALL_LINK_ARGS.append("-fvisibility=hidden")
ALL_CPPFLAGS.append("-DJUCE_MODULE_AVAILABLE_juce_audio_devices=1")
ALL_CFLAGS += ["-Wno-comment"]
elif platform.system() == "Linux":
ALL_CPPFLAGS.append("-DLINUX=1")
# We use GCC on Linux, which doesn't take a value for the -flto flag:
if not DEBUG and not os.getenv("DISABLE_LTO"):
ALL_CPPFLAGS.append("-flto")
ALL_LINK_ARGS.append("-flto")
ALL_LINK_ARGS.append("-fvisibility=hidden")
ALL_CFLAGS += ["-Wno-comment"]
elif platform.system() == "Windows":
ALL_CPPFLAGS.append("-DWINDOWS=1")
ALL_CPPFLAGS.append("-DJUCE_MODULE_AVAILABLE_juce_audio_devices=1")
Expand Down
24 changes: 12 additions & 12 deletions tests/test_external_plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,7 @@ def test_preset_parameters(plugin_filename: str, plugin_preset: str):
# plugin with default params.
plugin = load_test_plugin(plugin_filename)

default_params = {k: v.raw_value for k, v in plugin.parameters.items() if v.type == float}
default_params = {k: v.raw_value for k, v in plugin.parameters.items() if v.type is float}

# load preset file
plugin.load_preset(plugin_preset)
Expand All @@ -309,7 +309,7 @@ def test_initial_parameters(plugin_filename: str):
# or "gain" to 0, which slows down the re-initialization of a plugin.
k: (v.max_value if k == "gain" else v.min_value)
for k, v in get_parameters(plugin_filename).items()
if v.type == float
if v.type is float
}

# Reload the plugin, but set the initial parameters in the load call.
Expand All @@ -330,7 +330,7 @@ def test_initial_parameters(plugin_filename: str):
[
(path, parameter)
for path in AVAILABLE_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == float]
for parameter in [k for k, v in get_parameters(path).items() if v.type is float]
],
5,
),
Expand Down Expand Up @@ -557,7 +557,7 @@ def test_attributes_proxy(plugin_filename: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == bool]
for parameter in [k for k, v in get_parameters(path).items() if v.type is bool]
],
5,
),
Expand Down Expand Up @@ -585,7 +585,7 @@ def test_bool_parameters(plugin_filename: str, parameter_name: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == bool]
for parameter in [k for k, v in get_parameters(path).items() if v.type is bool]
],
5,
),
Expand All @@ -602,7 +602,7 @@ def test_bool_parameter_valdation(plugin_filename: str, parameter_name: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == float]
for parameter in [k for k, v in get_parameters(path).items() if v.type is float]
],
5,
),
Expand Down Expand Up @@ -644,7 +644,7 @@ def test_float_parameters(plugin_filename: str, parameter_name: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == float]
for parameter in [k for k, v in get_parameters(path).items() if v.type is float]
],
5,
),
Expand Down Expand Up @@ -681,7 +681,7 @@ def test_float_parameter_valdation(plugin_filename: str, parameter_name: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == str]
for parameter in [k for k, v in get_parameters(path).items() if v.type is str]
],
5,
),
Expand Down Expand Up @@ -709,7 +709,7 @@ def test_str_parameters(plugin_filename: str, parameter_name: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == str]
for parameter in [k for k, v in get_parameters(path).items() if v.type is str]
],
5,
),
Expand All @@ -735,7 +735,7 @@ def test_plugin_parameters_persist_between_calls(plugin_filename: str):
for name, parameter in plugin.parameters.items():
if name == "program":
continue
if parameter.type == float:
if parameter.type is float:
low, high, step = parameter.range
if not step:
step = 0.1
Expand All @@ -747,9 +747,9 @@ def test_plugin_parameters_persist_between_calls(plugin_filename: str):
x * step for x in list(range(int(low / step), int(high / step), 1)) + [high / step]
]
random_value = random.choice(values)
elif parameter.type == bool:
elif parameter.type is bool:
random_value = bool(random.random())
elif parameter.type == str:
elif parameter.type is str:
if parameter.valid_values:
random_value = random.choice(parameter.valid_values)
else:
Expand Down
18 changes: 18 additions & 0 deletions vendors/fftw3/AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Authors of FFTW (reachable at fftw@fftw.org):

Matteo Frigo <athena@fftw.org>
Steven G. Johnson <stevenj@alum.mit.edu>

Stefan Kral <skral@fftw.org> wrote genfft-k7/*.ml*, which was
added in fftw-3.0 and removed in fftw-3.2.

Romain Dolbeau contributed support for AVX512 and KCvi.

Erik Lindahl contributed support for AVX2 and Power8 VSX.

Support for the Cell Broadband Engine was graciously donated by the
IBM Austin Research Lab, which was added in fftw-3.2 and removed in
fftw-3.3.

Support for MIPS64 paired-single SIMD instructions was graciously
donated by CodeSourcery, Inc.
Loading

0 comments on commit 60404dd

Please sign in to comment.