-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LACE / NoLACE and DRED on Fixed Point implementations? #318
Comments
Correct. All new DNN-based features are floating-point only. The reasoning is that most of the chips that are powerful enough to run that DNN code will also have an FPU. So at least for now (things can change) there's no plan to implement those in fixed-point. |
Here's my vote for fixed point support of all features. In my testing (mainly speech), the fixed point build of 1.5 uses only about 2/3 the cpu time of the floating point build when encoding complexity is above 5. My application is for a high-density server, so in moving to 1.5, I have to make a choice between a decrease in density to get PLC and LACE/NoLACE, or a significant increase in density if I used 1.5 fixed point and lose the new features. |
On most modern chips floating-point should actually be faster than fixed-point. Maybe there's some optimization that isn't getting enabled. |
Try enable fast-math, float-approx and if you run a server with known hardware from this decade you can presume avx2 and sse 4.2 of your opus build. |
Where do I find the "fast-math" option? float-approx is enabled. I have MAY_HAVE_SSE4_1 and MAY_HAVE_AVX2 enabled, but only presume up to SSE2. Could the run-time dispatching account for such a big difference? I can set those to PRESUME and give it a try. Testing on a Core i9-13900, btw. |
What build system are you using? Autotools, CMake or Meson? |
Using our own cmake-based system. I started off with the linux build and generated a Makefile with configure. I used that to build our CMakeLists.txt with just the options we need. The only difference in options between the windows and linux builds was linux had VAR_ARRAY enabled and windows has ALLOCA enabled instead. I just completed rebuilding with PRESUME for sse4.1 and avx2 and re-ran the benchmarks and now, to my surprise, the 1.5-fixed and 1.5-float results are much closer. Either my initial test run was flawed, or the PRESUME makes a pretty large difference. Will try going back to MAY_HAVE for sse4.1 and avx2 and let you know if that was really the difference. |
@bateyejoe if you have custom then you are on your own :) you can look at the opus CMakefiles and see how it is enabling the following options. OPUS_FLOAT_APPROX, enable floating point approximations (Ensure your platform supports IEEE 754 before enabling). It's some defines and some compiler flags. |
It's possible you never actually enabled the RTCD, which would prevent the code from taking advantage of any of the MAY_HAVEs. |
I think you're right. Switching back to MAY_HAVE-only still performs on par with the fixed version, so I obviously missed something in that first config. In any case, I don't think fixed support is completely worthless on modern processors. As I understand it, with multithread cores, simultaneous execution of integer and float operations is possible, so having workloads with both integer and float math is beneficial. In our case, we already have quite a bit of float math going on which is one of the reasons we chose the Opus fixed build in the past. Thanks for assistance. |
Hi,
The new ML algorithms in v1.5 are really impressive. It looks like they're only for implementations of OPUS that are Floating Point.
I'm compiling here for Xtensa LX6 (ESP32) which doesn't have a hard FPU and thus need the Fixed Point implementation to have any real-time audio encoding / decoding.
I haven't really dug into the code, but my guess is the networks are represented in and presented with floating point values.
The text was updated successfully, but these errors were encountered: