Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoder overhaul (16 to f32 samples + other enhancements) #669

Merged
merged 18 commits into from
Jan 21, 2025

Conversation

roderickvd
Copy link
Collaborator

@roderickvd roderickvd commented Jan 3, 2025

This PR improves audio quality in two ways:

  1. Having decoders output f32 samples directly instead of converting to i16
  2. Properly handling high bit-depth FLAC and WAV files (20/24/32-bit) without bit depth reduction

Benefits include:

  • Eliminates unnecessary quantization error from i16 conversion
  • Better matches the Rodio pipeline which already uses f32 internally
  • Preserves greater-than-i16 precision from formats like WAV and FLAC
  • More natural for decoders like Vorbis/MP3 that operate in floating point

Breaking changes:

  • Decoder output type changes from i16 to f32

Changes:

  • Updated decoders to output f32 samples and preserve full bit depth
  • Updated tests and benchmarks to work with f32 samples
  • Simplified benchmark code by removing redundant .to_f32s() calls

Testing:

  • Verified tests, benchmarks and examples run successfully
  • Tested integration with pleezer

Fixes #364

@dvdsk
Copy link
Collaborator

dvdsk commented Jan 3, 2025

@roderickvd first of all thank you for contributing to rodio 🎉. This is quite a big change before we go on we have to discuss if and how this best fits. This might take a while, that does not mean we do not appreciate all your work!

@iluvcapra @PetrGlad if you have time I would like to hear your opinions on this.

@roderickvd
Copy link
Collaborator Author

Sure thing, I'll be here to address any feedback.

@PetrGlad
Copy link
Collaborator

PetrGlad commented Jan 5, 2025

// Down-sampling usually refers to time of spatial sampling frequency not samples' precision.
So far one of the goals of rodio was to allow running it in low-power (low performance) devices. I admit that just defaulting to f32 would simplify the implementation, but f32 is not always supported. Maybe we can have a feature switch to choose between, say, u16 as default and f32. Ideally users should be able to choose any supported sample format but I do not see yet how exactly such API would look like. There is a discussion on how to avoid unnecessary resampling here #646. I believe sample format is relatively cheap, but maybe it can also be considered there.
So maybe 2 options higher-performance/higher-quality would be enough.

@PetrGlad
Copy link
Collaborator

PetrGlad commented Jan 5, 2025

I now see that wav decoder can be cleaned up a bit, The format switch can be in its new() and there are some unwraps that can return an error instead.

Copy link
Collaborator

@PetrGlad PetrGlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that sample format is now a type alias, see also my comment to the PR.
// s can be done separately or here, at your discretion...

src/decoder/wav.rs Outdated Show resolved Hide resolved
src/decoder/wav.rs Outdated Show resolved Hide resolved
src/decoder/wav.rs Outdated Show resolved Hide resolved
src/decoder/mod.rs Outdated Show resolved Hide resolved
@dvdsk
Copy link
Collaborator

dvdsk commented Jan 5, 2025

Having decoders output f32 samples directly instead of converting to i16

what is the upside to this? perf?

Properly handling high bit-depth FLAC and WAV files (20/24/32-bit) without downsampling

can you expand a little on the use case for higher bit depth audio? As far as I know CD quality is fully transparent and CD uses 16 bit precision. Is the additional precision needed for effects? I can think of is software volume control using the extra precision but that needs to be paired with a higher bit output. Are those exposed by cpal?

Better matches the Rodio pipeline which already uses f32 internally

Parts of rodio use f32 we go through great pain (all those generics 😢) to allow for any sample type in most of the pipeline. Though I have never tested what the difference in performance is, I expect an integer based pipeline to be faster then a floating point one. We should test that.

If there is no noticeable decrease in performance this change would be very nice, it does away a ton of complex code. Writing and maintaining that code was painful however so I want us to be completely sure we do not need it before doing away with it.

@roderickvd
Copy link
Collaborator Author

Thanks for your swift feedback.

@PetrGlad:

// Down-sampling usually refers to time of spatial sampling frequency not samples' precision.

Yes, I've updated the PR and description. Will do the same when I push a new commit.

So far one of the goals of rodio was to allow running it in low-power (low performance) devices. I admit that just defaulting to f32 would simplify the implementation, but f32 is not always supported.

I understand that goal, and can only agree to support as much low-spec hardware as reasonable. Some thoughts:

Maybe we can have a feature switch to choose between, say, u16 as default and f32. Ideally users should be able to choose any supported sample format but I do not see yet how exactly such API would look like.

Here some ideas for probably a separate PR:

CamillaDSP uses a type alias for it. You have a choice between f32 and f64 by using a Cargo feature.

Or go the generic route with Sample everywhere.

There is a discussion on how to avoid unnecessary resampling here #646. I believe sample format is relatively cheap, but maybe it can also be considered there. So maybe 2 options higher-performance/higher-quality would be enough.

High-quality resampling will invariably require floating point processing. I see that you're toying with Rubato which indeed works with f32 or f64 samples.

I now see that wav decoder can be cleaned up a bit, The format switch can be in its new() and there are some unwraps that can return an error instead.

Yes. I can clean it up if you want. So far I left it untouched, in order to leave the PR scoped as small as possible.

@dvdsk:

what is the upside to this? perf?

I tried describing the benefits in the PR description. It's more about quality than about performance.

can you expand a little on the use case for higher bit depth audio? As far as I know CD quality is fully transparent and CD uses 16 bit precision. Is the additional precision needed for effects? I can think of is software volume control using the extra precision but that needs to be paired with a higher bit output. Are those exposed by cpal?

I have the desire nor anything new to add to discussions about the merits of high resolution audio. I'll just say that it exists 😄 and so, how to support it in the best possible way?

Indeed, what would benefit from this is:

  • digital signal processing; chaining DSPs in integer will increasingly introduce quantization errors
  • software volume control; attenuation in i16 will decrease dynamic range
  • bouncing; risk of clipping when done in integer

cpal indeed exposes higher bit output all the way up to f64.

With regards output to platforms, all of which supported by cpal:

  • Linux: ALSA supports up to FLOAT64 but realistically will be used up to S32 on USB.
  • macOS: CoreAudio always uses Float32.
  • Windows: WASAPI always converts to float internally.
  • JACK Audio always uses float.

If there is no noticeable decrease in performance this change would be very nice, it does away a ton of complex code. Writing and maintaining that code was painful however so I want us to be completely sure we do not need it before doing away with it.

If there is a performance difference, from experience with librespot, it is negligible on all but exotic and by now very old platforms (even the venerable Raspberry Pi Zero did fine).

Above, @PetrGlad hinted that there may be an interest to have a user-selectable audio pipeline. If so, you may want to keep the generics - they are not dynamically dispatched so are just as fast as settling on a single type of pipeline. Their very existence made this PR quite easy to make.

Personally, seeing from the overview audio host platforms, and DSP libraries like CamillaDSP and Rubato, I wonder if it's worth the trouble to indeed make it user-selectable between integer and floating point. When selecting i16 would you also change all the current f32 casts to integer arithmetic? You could, but it becomes a lot and for which platforms?

@dvdsk
Copy link
Collaborator

dvdsk commented Jan 5, 2025

@roderickvd first of all thank you very much for that very clear and thorough response!

I got a pi zero and a pinephone I want to check performance on, those both have fpu's but those might be slower then integer operations (I doubt it matters here). I'll try if I can check that tomorrow, cant promise anything unfortunately.

@PetrGlad If we adopt this we should seriously consider making Source's item f32. That would dramatically simplify mix, uniformsource, the new in progress resampler trait and probably more. To me that is the biggest appeal of this change.

@est31 if you got time, whats your opinion on such a change?

@roderickvd
Copy link
Collaborator Author

a1cf45a addresses most of the review comments.

I see in that in all decoders, there's a lot of unwrapping going on, as well as is_wav, is_flac, etc. checks. Why are those necessary? The inner decoder should fail opening them otherwise anyway.

I'll have another stab at some refactoring of e.g. Sample::saturating_add that we can now depend on dasp_sample for, but wanted to share this progress with you regardless.

@roderickvd
Copy link
Collaborator Author

@roderickvd first of all thank you very much for that very clear and thorough response!

I got a pi zero and a pinephone I want to check performance on, those both have fpu's but those might be slower then integer operations (I doubt it matters here). I'll try if I can check that tomorrow, cant promise anything unfortunately.

Great, let me know!

@PetrGlad If we adopt this we should seriously consider making Source's item f32. That would dramatically simplify mix, uniformsource, the new in progress resampler trait and probably more. To me that is the biggest appeal of this change.

Isn't it already generic over u16, i16 and f32 by the Sample trait bound?

@dvdsk
Copy link
Collaborator

dvdsk commented Jan 6, 2025

Isn't it already generic over u16, i16 and f32 by the Sample trait bound?

yes however writing interfaces that cover both generic code and non-generic is rather ugly. See (wip): https://github.com/RustAudio/rodio/blob/resamplers/src/conversions/sample_rate.rs & first bullet point here: #670 (comment). I needed to combine sample type conversion and sample_rate conversion since rubato only returns f32. It works perfectly however the generics are quite brittle and the compiler messages not that helpful. Brittle code makes refactoring hard which in turn leads to bad maintainability.

@roderickvd roderickvd force-pushed the feature/float-decoder branch from 820f6bb to 01dc939 Compare January 6, 2025 20:48
@roderickvd
Copy link
Collaborator Author

roderickvd commented Jan 6, 2025

01dc939 adds an integer-samples feature that switches the decoder output from f32 to i16.

On my Apple M3 - so: fast and with hard float - I timed the benchmark suite. The following values are averaged over three runs, after having performed two "warm-up" runs for cache locality whose timings were ignored.

Nothing scientific and WAV only:

# baseline: i16 default decoders
# time cargo bench --features integer-samples
28.52 real        28.25 user         0.24 sys

# i16 Symphonia
# time cargo bench --features "symphonia-all,integer-samples"
25.91 real        25.62 user         0.24 sys

# f32 default decoders
# time cargo bench
29.33 real        29.04 user         0.24 sys

# f32 Symphonia
# time cargo bench --features symphonia-all
25.83 real        25.56 user         0.25 sys

@PetrGlad
Copy link
Collaborator

PetrGlad commented Jan 6, 2025

This is somewhat unrelated, but dasp_sample is more flexible than Rodio's since it implements Sample trait for plain scalar values. I wanted to use Duplex trait from it, but such a requirement should then be propagated to other sources/filters that are used in Source. We do not even have Debug for Samples, which is rather cumbersome while debugging.

@roderickvd
Copy link
Collaborator Author

This is somewhat unrelated, but dasp_sample is more flexible than Rodio's since it implements Sample trait for plain scalar values. I wanted to use Duplex trait from it, but such a requirement should then be propagated to other sources/filters that are used in Source. We do not even have Debug for Samples, which is rather cumbersome while debugging.

In 5b61b19 and other commits before it, I've cleaned up radio::Sample to reuse what we can from dasp_sample::Sample without breaking the Rodio API.

I'm already feeling I'm stepping out of bounds of this PR 😄 but I'll be happy to tackle what I can and you want.

@PetrGlad
Copy link
Collaborator

PetrGlad commented Jan 6, 2025

@roderickvd No, there is no need to do extra work here, I just try to keep the ideas somewhere.

@dvdsk
Copy link
Collaborator

dvdsk commented Jan 7, 2025

keep calm :) Love the enthusiasm however should we not first decide if we are moving to ungeneric f32? I'd hate to have to remove hard work.

I wanted to get some benchmarks today but Pop-os multiarch repo's are broken and podman wont work on zfs it seems..... Tomorrow ill spin up a VM to crosscompile in, bit of a hassle 😓. Soon as I got that running I'll have some benchmark results.

@PetrGlad
Copy link
Collaborator

PetrGlad commented Jan 7, 2025

@roderickvd Microbenchmarks are always noisy, anyway I would not expect much difference in case f32 is supported by CPU. Although the samples are larger so it should take some toll on cache. I'd be curious to see the differences when the system only has software emulation for floats.

@roderickvd
Copy link
Collaborator Author

I have none of those systems. Even the original Raspberry Pi 1 Model A from 2012 has hardware floating point support.

@dvdsk
Copy link
Collaborator

dvdsk commented Jan 7, 2025

There is a rodio fork that specializes for embedded systems (so not cpal). I can not seem to find it unfortunately. Rodio is currently not suitable for embedded due to us allocating memory here and there. Its not one of the goals for us, since those are the only targets without hardware float we can probably ignore that category.

I just want to see what weaker hardware float systems do with an only f32 pipeline. I can imagine it might mess up cache locality or the float ops take more power/cycles. Maybe tomorrow we will know.

@roderickvd
Copy link
Collaborator Author

You mean awedio?

@dvdsk
Copy link
Collaborator

dvdsk commented Jan 7, 2025

You mean awedio?

Yes! 🎉 Its a great crate and has some features we are still implementing/discussing.

@roderickvd
Copy link
Collaborator Author

roderickvd commented Jan 7, 2025

I feel like I'm getting to a state where I'd be happy with. Things to do:

  • Cleaning up the individual decoders like pointed out in the earlier review comments - FLAC, WAV and (mini) MP3 done, the test to go.
  • Renaming the integer-samples feature to integer-decoder - after all there is f32 processing in the pipeline regardless
  • Some micro-optimizations to bypass to_sample depending on the feature gate

Just a hobby 😁 don't worry to instruct me to take things out.

@roderickvd
Copy link
Collaborator Author

OK - I think I've got it as far as I wanted to. Ready for further review.

@dvdsk dvdsk changed the title Switch decoder output from i16 to f32 samples Decoder overhaul (16 to f32 samples + other enhancements) Jan 10, 2025
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// If we accept that f32 is required, I am not sure we'll need rodio::Sample at all, often it only gets in the way.

true
let stream_pos = data.stream_position().unwrap_or_default();
let result = OggStreamReader::new(data.by_ref()).is_ok();
let _ = data.seek(SeekFrom::Start(stream_pos));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are such errors ignored now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to decrease the probability of panicking.

When the stream position could not be gotten or restored, then old behavior was to panic.

New behavior is to still probe whether a demuxer/decoder can be initialized (whether we "are Vorbis, MP3, ...") and continue. This may still trigger the old behavior, because if the stream cannot be rewound, then in the new method that calls this is_vorbis it is likely that re-initializing the decoder will fail on this advance position.

@PetrGlad
Copy link
Collaborator

@dvdsk I am inclined to accept this, users who need integer samples can still have them. Or did I miss something?

@dvdsk
Copy link
Collaborator

dvdsk commented Jan 12, 2025

@dvdsk I am inclined to accept this, users who need integer samples can still have them. Or did I miss something?

I'm okay with most of this, I just want to have a perf check done but I have been in crosscompile &cpal bug hell 😭 . But I might be able to simplify is by using the new rodio without cpal feature you introduced. Results will be a bit less reliable as we will no longer measuring the full pipeline including playback but it should still tell us enough.

I expect nothing bad to come out of the check but I would like to have it done before merging this. I think I'll be able to get the perf check done tomorrow. That is if my test targets do not magically explode.

And if the check comes out alright we should probably change the whole pipeline to f32's, but that is a discussion for a later PR.

@dvdsk
Copy link
Collaborator

dvdsk commented Jan 13, 2025

Benchmarks look great, about 2% slower on low end systems with fpu if the entire pipeline is using f32. See benchmarks + data here: https://github.com/dvdsk/rodio_bench.

Results in short (avgs of multiple runs)

data_pi3b_float.data
	seconds time elapsed: 0.912484
	seconds user: 0.912484
data_pi3b_integer.data
	seconds time elapsed: 0.925034
	seconds user: 0.925034
data_pinephone_float.data
	seconds time elapsed: 0.503542
	seconds user: 0.503542
data_pinephone_integer.data
	seconds time elapsed: 0.493424
	seconds user: 0.493424

@dvdsk
Copy link
Collaborator

dvdsk commented Jan 13, 2025

im gonna review this pr as is now and then it can be merged

src/decoder/wav.rs Outdated Show resolved Hide resolved
@dvdsk
Copy link
Collaborator

dvdsk commented Jan 14, 2025

the tests are failing due to a cargo clippy warning in unrelated code. Lets ignore that, ill fix the issue on main.

(fix pushed to master)

This change improves audio quality in two ways:
1. Having decoders output `f32` samples directly instead of converting to `i16`
2. Properly handling high bit-depth FLAC and WAV files (20/24/32-bit) without downsampling

Benefits include:
- Eliminates unnecessary quantization error from `i16` conversion
- Better matches the rodio pipeline which already uses `f32` internally
- Preserves greater-than-i16 precision from formats like WAV and FLAC
- More natural for decoders like Vorbis/MP3 that operate in floating point

Breaking change - decoder output type changes from `i16` to `f32`.

Changes:
- Updated decoders to output `f32`
- Updated `wav_test.rs` and `flac_test.rs` assertions to check for non-zero `f32` samples
- Updated benchmarks to use `f32` samples and removed unnecessary `f32` conversions

Fixes RustAudio#364
Refactoring of sample type handling and conversion:
- Rename DecoderFormat to DecoderSample for clarity
- Use dasp_sample's Sample and ToSample traits consistently
- Implement default Sample trait methods using dasp_sample
- Standardize bit depth handling across decoders:
  - FLAC & WAV: handle unsupported samples gracefully
  - Unified approach for 8/12/16/20/24/32-bit across formats
- Clean up redundant Sample implementations using trait defaults

This change makes sample handling more consistent across different
audio formats while improving handling of edge cases in bit depths.
- Fix broken playback of float WAV samples
- Simplify bit depth handling by using hound's native conversions where possible
- Maintain fallback handling for non-standard bit depths
- Add integer-samples feature to optionally use i16 instead of f32 samples
- Fix unreliable zero detection in WAV and MP4 tests using Sample::is_zero()
- Improve test reliability by checking !all(is_zero) instead of any(!=0.0)

The new feature allows applications to choose between floating-point
and fixed-point sample representation based on their needs, while
maintaining f32 the default for best sound quality.
Re-introduce zero_value() as a deprecated method to provide migration
path towards using ZERO_VALUE constant. Add is_zero() helper method
for more ergonomic zero checks.

Also cleanup implementations:
- Remove redundant saturating_add() implementations (now using trait default)
- Add #[inline] attributes for performance
- Simplify TestSource initialization by reverting use of generics
- Shortcut direct f32 conversion method for Sample trait
- Shortcut TestSource<f32> implementation
- Fix potential overflow in Duration calculation by doing division before multiplication
- Calculate total_duration once during decoder initialization instead of each call
- Remove unwrap() calls in is_flac() for better error handling
- Refactor code for better maintainability and safety
- Fix potential overflow in total_duration calculation
- Remove unwrap() calls in is_wave() for better error handling
- Remove unwrap() calls in MP3 format detection
- Convert samples to `DecoderFormat` using to_sample()
Instead of unwrapping the track lookup result directly, add proper error
handling to return None when the requested track is not found in the format.
This prevents potential panic scenarios during audio decoding initialization.
Makes the Vorbis format detection more robust by:
- Using unwrap_or_default() for stream position failures
- Simplifying the is_vorbis() check logic
- Adding explanatory comment for unwrap in stream reader creation

This change reduces the number of unwrap calls and handles IO errors
more gracefully when checking for Vorbis format.
Add explanatory expect() messages when initializing decoders for FLAC, MP3,
and WAV formats. Also improves MP3 decoder's span length handling by removing
unwrap() in favor of the ? operator. These changes maintain the same behavior
while making debugging easier by providing context for potential panics.

Affected decoders:
- FLAC
- MP3
- WAV
Renamed the "integer-samples" feature flag to "integer-decoder" to better
reflect its purpose of controlling decoder output format rather than general
sample handling.

- Renamed feature flag in Cargo.toml
- Updated corresponding cfg attributes in source code
- Updated documentation to reflect the new feature name
- Maintains same functionality of switching between i16/f32 decoder output

This change makes the feature's purpose more explicit and matches the
component it affects.
Avoid unnecessary sample conversions in MP3 and WAV decoders by using
cfg attributes to conditionally perform format conversion only when needed.
Add #[inline] attributes to:
- into_inner() methods across decoders (WAV, Vorbis, Symphonia)
- read() and seek() implementations for ReadSeekSource

These simple pass-through methods are good candidates for inlining as they
just forward calls to their inner types without additional logic.
Instead of returning EQUILIBRIUM (0.0) when encountering WAV bit depths > 32,
return None to terminate the iterator. Also add error logging through tracing
or stdout to indicate the unsupported format.
@roderickvd roderickvd force-pushed the feature/float-decoder branch from 6498406 to 9861056 Compare January 20, 2025 19:19
@roderickvd roderickvd requested a review from dvdsk January 20, 2025 19:21
@roderickvd
Copy link
Collaborator Author

the tests are failing due to a cargo clippy warning in unrelated code. Lets ignore that, ill fix the issue on main.

(fix pushed to master)

Rebased on it.

@dvdsk
Copy link
Collaborator

dvdsk commented Jan 21, 2025

lets get this merged 🎉, thank you for your valuable input, hard work and patience Roderick!

@dvdsk dvdsk merged commit d1a9b79 into RustAudio:master Jan 21, 2025
11 checks passed
@roderickvd roderickvd deleted the feature/float-decoder branch January 21, 2025 11:31
@roderickvd
Copy link
Collaborator Author

👍 on to the next!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why the Decoder yields i16?
3 participants