-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoder overhaul (16 to f32 samples + other enhancements) #669
Conversation
@roderickvd first of all thank you for contributing to rodio 🎉. This is quite a big change before we go on we have to discuss if and how this best fits. This might take a while, that does not mean we do not appreciate all your work! @iluvcapra @PetrGlad if you have time I would like to hear your opinions on this. |
Sure thing, I'll be here to address any feedback. |
// Down-sampling usually refers to time of spatial sampling frequency not samples' precision. |
I now see that wav decoder can be cleaned up a bit, The format switch can be in its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that sample format is now a type alias, see also my comment to the PR.
//
s can be done separately or here, at your discretion...
what is the upside to this? perf?
can you expand a little on the use case for higher bit depth audio? As far as I know CD quality is fully transparent and CD uses 16 bit precision. Is the additional precision needed for effects? I can think of is software volume control using the extra precision but that needs to be paired with a higher bit output. Are those exposed by cpal?
Parts of rodio use f32 we go through great pain (all those generics 😢) to allow for any sample type in most of the pipeline. Though I have never tested what the difference in performance is, I expect an integer based pipeline to be faster then a floating point one. We should test that. If there is no noticeable decrease in performance this change would be very nice, it does away a ton of complex code. Writing and maintaining that code was painful however so I want us to be completely sure we do not need it before doing away with it. |
Thanks for your swift feedback.
Yes, I've updated the PR and description. Will do the same when I push a new commit.
I understand that goal, and can only agree to support as much low-spec hardware as reasonable. Some thoughts:
Here some ideas for probably a separate PR: CamillaDSP uses a type alias for it. You have a choice between Or go the generic route with
High-quality resampling will invariably require floating point processing. I see that you're toying with Rubato which indeed works with
Yes. I can clean it up if you want. So far I left it untouched, in order to leave the PR scoped as small as possible.
I tried describing the benefits in the PR description. It's more about quality than about performance.
I have the desire nor anything new to add to discussions about the merits of high resolution audio. I'll just say that it exists 😄 and so, how to support it in the best possible way? Indeed, what would benefit from this is:
With regards output to platforms, all of which supported by
If there is a performance difference, from experience with Above, @PetrGlad hinted that there may be an interest to have a user-selectable audio pipeline. If so, you may want to keep the generics - they are not dynamically dispatched so are just as fast as settling on a single type of pipeline. Their very existence made this PR quite easy to make. Personally, seeing from the overview audio host platforms, and DSP libraries like CamillaDSP and Rubato, I wonder if it's worth the trouble to indeed make it user-selectable between integer and floating point. When selecting |
@roderickvd first of all thank you very much for that very clear and thorough response! I got a pi zero and a pinephone I want to check performance on, those both have fpu's but those might be slower then integer operations (I doubt it matters here). I'll try if I can check that tomorrow, cant promise anything unfortunately. @PetrGlad If we adopt this we should seriously consider making @est31 if you got time, whats your opinion on such a change? |
a1cf45a addresses most of the review comments. I see in that in all decoders, there's a lot of unwrapping going on, as well as I'll have another stab at some refactoring of e.g. |
Great, let me know!
Isn't it already generic over |
yes however writing interfaces that cover both generic code and non-generic is rather ugly. See (wip): https://github.com/RustAudio/rodio/blob/resamplers/src/conversions/sample_rate.rs & first bullet point here: #670 (comment). I needed to combine sample type conversion and sample_rate conversion since rubato only returns f32. It works perfectly however the generics are quite brittle and the compiler messages not that helpful. Brittle code makes refactoring hard which in turn leads to bad maintainability. |
820f6bb
to
01dc939
Compare
01dc939 adds an On my Apple M3 - so: fast and with hard float - I timed the benchmark suite. The following values are averaged over three runs, after having performed two "warm-up" runs for cache locality whose timings were ignored. Nothing scientific and WAV only:
|
This is somewhat unrelated, but dasp_sample is more flexible than Rodio's since it implements Sample trait for plain scalar values. I wanted to use Duplex trait from it, but such a requirement should then be propagated to other sources/filters that are used in Source. We do not even have |
In 5b61b19 and other commits before it, I've cleaned up I'm already feeling I'm stepping out of bounds of this PR 😄 but I'll be happy to tackle what I can and you want. |
@roderickvd No, there is no need to do extra work here, I just try to keep the ideas somewhere. |
keep calm :) Love the enthusiasm however should we not first decide if we are moving to ungeneric f32? I'd hate to have to remove hard work. I wanted to get some benchmarks today but Pop-os multiarch repo's are broken and podman wont work on zfs it seems..... Tomorrow ill spin up a VM to crosscompile in, bit of a hassle 😓. Soon as I got that running I'll have some benchmark results. |
@roderickvd Microbenchmarks are always noisy, anyway I would not expect much difference in case f32 is supported by CPU. Although the samples are larger so it should take some toll on cache. I'd be curious to see the differences when the system only has software emulation for floats. |
I have none of those systems. Even the original Raspberry Pi 1 Model A from 2012 has hardware floating point support. |
There is a rodio fork that specializes for embedded systems (so not cpal). I can not seem to find it unfortunately. Rodio is currently not suitable for embedded due to us allocating memory here and there. Its not one of the goals for us, since those are the only targets without hardware float we can probably ignore that category. I just want to see what weaker hardware float systems do with an only f32 pipeline. I can imagine it might mess up cache locality or the float ops take more power/cycles. Maybe tomorrow we will know. |
You mean awedio? |
Yes! 🎉 Its a great crate and has some features we are still implementing/discussing. |
I feel like I'm getting to a state where I'd be happy with. Things to do:
Just a hobby 😁 don't worry to instruct me to take things out. |
OK - I think I've got it as far as I wanted to. Ready for further review. |
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If we accept that f32
is required, I am not sure we'll need rodio::Sample at all, often it only gets in the way.
true | ||
let stream_pos = data.stream_position().unwrap_or_default(); | ||
let result = OggStreamReader::new(data.by_ref()).is_ok(); | ||
let _ = data.seek(SeekFrom::Start(stream_pos)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are such errors ignored now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is to decrease the probability of panicking.
When the stream position could not be gotten or restored, then old behavior was to panic.
New behavior is to still probe whether a demuxer/decoder can be initialized (whether we "are Vorbis, MP3, ...") and continue. This may still trigger the old behavior, because if the stream cannot be rewound, then in the new
method that calls this is_vorbis
it is likely that re-initializing the decoder will fail on this advance position.
@dvdsk I am inclined to accept this, users who need integer samples can still have them. Or did I miss something? |
I'm okay with most of this, I just want to have a perf check done but I have been in crosscompile &cpal bug hell 😭 . But I might be able to simplify is by using the new rodio without cpal feature you introduced. Results will be a bit less reliable as we will no longer measuring the full pipeline including playback but it should still tell us enough. I expect nothing bad to come out of the check but I would like to have it done before merging this. I think I'll be able to get the perf check done tomorrow. That is if my test targets do not magically explode. And if the check comes out alright we should probably change the whole pipeline to f32's, but that is a discussion for a later PR. |
Benchmarks look great, about 2% slower on low end systems with fpu if the entire pipeline is using f32. See benchmarks + data here: https://github.com/dvdsk/rodio_bench. Results in short (avgs of multiple runs)
|
im gonna review this pr as is now and then it can be merged |
the tests are failing due to a cargo clippy warning in unrelated code. Lets ignore that, ill fix the issue on main. (fix pushed to master) |
This change improves audio quality in two ways: 1. Having decoders output `f32` samples directly instead of converting to `i16` 2. Properly handling high bit-depth FLAC and WAV files (20/24/32-bit) without downsampling Benefits include: - Eliminates unnecessary quantization error from `i16` conversion - Better matches the rodio pipeline which already uses `f32` internally - Preserves greater-than-i16 precision from formats like WAV and FLAC - More natural for decoders like Vorbis/MP3 that operate in floating point Breaking change - decoder output type changes from `i16` to `f32`. Changes: - Updated decoders to output `f32` - Updated `wav_test.rs` and `flac_test.rs` assertions to check for non-zero `f32` samples - Updated benchmarks to use `f32` samples and removed unnecessary `f32` conversions Fixes RustAudio#364
Refactoring of sample type handling and conversion: - Rename DecoderFormat to DecoderSample for clarity - Use dasp_sample's Sample and ToSample traits consistently - Implement default Sample trait methods using dasp_sample - Standardize bit depth handling across decoders: - FLAC & WAV: handle unsupported samples gracefully - Unified approach for 8/12/16/20/24/32-bit across formats - Clean up redundant Sample implementations using trait defaults This change makes sample handling more consistent across different audio formats while improving handling of edge cases in bit depths.
- Fix broken playback of float WAV samples - Simplify bit depth handling by using hound's native conversions where possible - Maintain fallback handling for non-standard bit depths
- Add integer-samples feature to optionally use i16 instead of f32 samples - Fix unreliable zero detection in WAV and MP4 tests using Sample::is_zero() - Improve test reliability by checking !all(is_zero) instead of any(!=0.0) The new feature allows applications to choose between floating-point and fixed-point sample representation based on their needs, while maintaining f32 the default for best sound quality.
Re-introduce zero_value() as a deprecated method to provide migration path towards using ZERO_VALUE constant. Add is_zero() helper method for more ergonomic zero checks. Also cleanup implementations: - Remove redundant saturating_add() implementations (now using trait default) - Add #[inline] attributes for performance
- Simplify TestSource initialization by reverting use of generics - Shortcut direct f32 conversion method for Sample trait - Shortcut TestSource<f32> implementation
- Fix potential overflow in Duration calculation by doing division before multiplication - Calculate total_duration once during decoder initialization instead of each call - Remove unwrap() calls in is_flac() for better error handling - Refactor code for better maintainability and safety
- Fix potential overflow in total_duration calculation - Remove unwrap() calls in is_wave() for better error handling
- Remove unwrap() calls in MP3 format detection - Convert samples to `DecoderFormat` using to_sample()
Instead of unwrapping the track lookup result directly, add proper error handling to return None when the requested track is not found in the format. This prevents potential panic scenarios during audio decoding initialization.
Makes the Vorbis format detection more robust by: - Using unwrap_or_default() for stream position failures - Simplifying the is_vorbis() check logic - Adding explanatory comment for unwrap in stream reader creation This change reduces the number of unwrap calls and handles IO errors more gracefully when checking for Vorbis format.
Add explanatory expect() messages when initializing decoders for FLAC, MP3, and WAV formats. Also improves MP3 decoder's span length handling by removing unwrap() in favor of the ? operator. These changes maintain the same behavior while making debugging easier by providing context for potential panics. Affected decoders: - FLAC - MP3 - WAV
Renamed the "integer-samples" feature flag to "integer-decoder" to better reflect its purpose of controlling decoder output format rather than general sample handling. - Renamed feature flag in Cargo.toml - Updated corresponding cfg attributes in source code - Updated documentation to reflect the new feature name - Maintains same functionality of switching between i16/f32 decoder output This change makes the feature's purpose more explicit and matches the component it affects.
Avoid unnecessary sample conversions in MP3 and WAV decoders by using cfg attributes to conditionally perform format conversion only when needed.
Add #[inline] attributes to: - into_inner() methods across decoders (WAV, Vorbis, Symphonia) - read() and seek() implementations for ReadSeekSource These simple pass-through methods are good candidates for inlining as they just forward calls to their inner types without additional logic.
Instead of returning EQUILIBRIUM (0.0) when encountering WAV bit depths > 32, return None to terminate the iterator. Also add error logging through tracing or stdout to indicate the unsupported format.
6498406
to
9861056
Compare
Rebased on it. |
lets get this merged 🎉, thank you for your valuable input, hard work and patience Roderick! |
👍 on to the next! |
This PR improves audio quality in two ways:
f32
samples directly instead of converting toi16
Benefits include:
i16
conversionf32
internallyi16
precision from formats like WAV and FLACBreaking changes:
i16
tof32
Changes:
f32
samples and preserve full bit depthf32
samples.to_f32s()
callsTesting:
Fixes #364