Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Minimize unnecessary resampling #646

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

PetrGlad
Copy link
Collaborator

@PetrGlad PetrGlad commented Dec 1, 2024

This should help to reduce number of sample rate conversions. Setting it to current de-facto default (44.1kHz).

There is still a question whether we should use cpal::SampleRate or plain integer. With either choice it is better to use the same type everywhere in the API. Currently some functions accept cpal::SampleRate while others use u32. cpal::SampleRate helps with type-checks but does not really provide useful services.

Also partially relieves #208. The rate converter has to be fixed still but with default sample rate less resampling is needed.

Also partially relieves RustAudio#208. The rate converter has to be fixed still
but with default sample rate less resampling is needed.
@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 1, 2024

A more advanced solution could allow processing steps to query current preferred sample rate and use that. With such API one could for example open output stream with 48kHz and all processing steps attached to it will adjust accordingly. Or make sample rate a parameter for every processing step that may be affected by it. Which cumbersome but could still help keeping 48kHz everywhere if one needs that. Alternatively it could be some global switch that can be read by all Rodio code, but I am not sure I like it since users may want more than one output stream with different sample rates.
But these are a lot bigger changes, and I do not have a clear idea how to implement those.

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 2, 2024

A more advanced solution could allow processing steps to query current preferred sample rate and use that.

that could look like:

impl trait Source for Something {
    fn negotiate_samplerate(&mut self, requested_by_parent: usize) {
        self.inner.negotiate_samplerate(requested_by_parent)
    }
}

Then the output calls negotiate_samplerate and the source(s) adjusts if they can.

We would need to add an api that allows specifying a different sample rate. That could nicely fit into the Source model too. It could be something like:

struct FixedSampleRate<S>{ 
   rate: usize,
   inner: S,
}

impl Source for FixedSampleRate {
    fn negotiate_samplerate(&mut self, requested_by_parent: usize) {
        debug!("FixedSampleRate set, ignoring samplerate requested by parent")
        self.inner.negotiate_samplerate(self.rate)
    }
}

trait Source {
    fn with_samplerate(self, rate: usize) -> Self {
        FixedSampleRate {
            rate,
            inner: self,
        }
    }
}

This might even allow us to improve mixer (which currently resamples both inputs) to use one sample-rate without resampling if possible. For example when mixing a Sine wave or Noise with a decoder (song playing from disk).

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 2, 2024

The different sample rates are a problem, especially since resampling incurs needless performance loses and introduces artifacts. I do not think that exporting a constant is the proper solution here though.

  • Its usage in examples just hides the fact that the default sample rate is number. The name default sample rate to me implies that there is such a thing as a default sample rate and this is it. As far as I know (which is not a lot) the audio industry has not standardized on a sample rate. A second problem is that it hides that the sample rate is just a number and easy to change.
  • It is useful in sine however that source will be deprecated soon. I would prefer to leave it as it is until its removed.
  • In empty and empty_callback it does nothing given these never produce a sample.

Concluding, I would say this is a problem worth fixing so lets fix it properly. Your proposal to 'query current preferred sample rate' is far superior to me. It does not only solve the named problems, it allows us to make the sample_rate choice optional! That in itself is worth the effort for it makes rodio far simpler to use.

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 4, 2024

related: #208

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 5, 2024

The intention for the constant was to document the state of affairs (see default stream initialization), It may not make sense for v1.0. I hoped that fully qualified name rodio::constants::DEFAULT... would help to emphasis that it is the rodio's default. Until we have dynamic way to negotiate sample rates, this constant may be helpful. I would not insist on this change, though.

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 5, 2024

Empty callback produces nothing but receiving side may not know this unless it looks at size_hint. Yes, in this case sample rate is not very useful since it can be optimized by using size_hint()._2

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 5, 2024

Until we have dynamic way to negotiate sample rates

I want to play with this soon given it might also be an answer to #493 (having empty get the channel count instructed from the parent/mixer its part of).

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 5, 2024

I'd prefer to have a constructor or builder to handle that since it allows users to have complete control over it if they want. Something like

// Or maybe can use rodio::stream::StreamConfig here instead.
struct StreamShape { 
  sample_rate: u32,
  channels: u8
}

impl ASource {
  pub fn new(params: &ASourceConf, conf: &StreamShape) -> ASource {
    todo!()
  }

  pub fn connect_new(params: &ASourceConf, sink: &mut Sink) {
        mixer.add(Self::new(mixer.input_shape()));
  }
}

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 5, 2024

I'd prefer to have a constructor or builder to handle that since it allows users to have complete control over it if they want. Something like

// Or maybe can use rodio::stream::StreamConfig here instead.
struct StreamShape { 
  sample_rate: u32,
  channels: u8
}

impl ASource {
  pub fn new(params: &ASourceConf, conf: &StreamShape) -> ASource {
    todo!()
  }

  pub fn connect_new(params: &ASourceConf, sink: &mut Sink) {
        mixer.add(Self::new(mixer.input_shape()));
  }
}

Complete control is still possible, in principle every source should try to adhere to whatever its parent/predecessor/sink/wrapping source (we really got to note down what we call this) decided. But a source can decide to pass on its own values. We can add a source that does this with a value given by the end user. That is what I meant here: #646 (comment)

You could use that like this:

let sine = Signalgenerator(Sine);
let source = Decoder::new("some path.mp3").pausable().amplify().mix(sine).with_preferred_samplerate(44_100).with_fixed_channels(2);
stream.mix(source);

The sine wave sample_rate will be 44_100 even though the stream might have tried to negotiate a 96_000 sample_rate. It is 44_100 because that is what with_preffered_samplerate(44_100) negotiated with mix, and mix then negotiated that with sine. The mp3 file might be some other sample rate however because mix got instructed to be sample rate 44_100 mix will resample the mp3.

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 5, 2024

a more advanced design could present multiple sample-rates during negotiation. In some order of preference. Maybe then the sources could negotiate with the playback hardware and find the optimal sample rate.

You recently reworked the api for that, do you think that is possible?

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 5, 2024

I'd prefer to have sample rate as a construction parameter. In case a source needs frames/chunks with output formats varying over time, that can be plugged into UniformSourceIterator or its equivalent. Do you have in mind any other cases when a source could not accept sample rate as a parameter?

Given your example a stream parameters can be passed as:

let output = OutputStreamBuilder::open_default_stream();

// Can be "shape" instead of "format", maybe. I like the word but frequency 
//     is not really a shape...
// Note that here is only one conversion step.
let source = Decoder::new("file.mp3")
     .format(output.format())     // Inserts UniformSourceIterator
     // All subsequent steps can rely that their input format will not change.
     .pausable().amplify(0.5).mix(Signalgenerator(Sine));
output.mix(source);

A bit inconvenient, maybe, but the parameter setting on each step is automatic. Each step's constructor knows format of its inputs and all the linking can be checked at construction time. So when input and output format do not match, an error can be returned immediately.

It can be done in reverse, maybe more convenient but more complex, especially for sinks that have dynamic set of inputs. output.mix(source) is called last so the outputs will have to try to call format setters on inputs recursively.

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 5, 2024

CPAL does give a set of possible audio formats so in theory one can calculate intersection between one's preferences and what is available, but I suspect it may be a bit complicated.

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 6, 2024

CPAL does give a set of possible audio formats so in theory one can calculate intersection between one's preferences and what is available, but I suspect it may be a bit complicated.

I like to try, it would be the ultimate solution right? We would minimize unneeded re-sampling and the user would not even notice (except less artifacts more perf).

One hard thing might be decoders, they might change samplerate mid song. The negotiation might need to happen multiple times to support that....

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 7, 2024

@dvdsk Yes, to get ultimate reduction in conversion one can do that from the both ends, enumerate all the inputs and see what output device can do and then pick some stream format that satisfies them all, but it will not always be possible. As I see it requiring inputs to convert to some common format and then just using that would be a meaningful improvement already. Alternatively if inputs are all of the same format then, use that and add a converter only at the end of the chain just before the output stream only if necessary.
What I do not like at the moment is that the code is unsure what comes as input and tries to adapt to that on every step all the time. This complicates code significantly since now every sink implementation has to handle format changes (current_frame_len()). And this likely has noticeable performance cost, if we want to keep an eye on that.

I believe all the sources that do change format are somehow external to rodio, are its inputs or close to inputs. That is why as I mentioned earlier I am for having also simpler Souce version that has predictable format. So anyone consuming it can rely on it, and simplify its processing accordingly.
Internal cases probably may change number of channels dynamically, so it should be dedicated processing step (like ChannelCountConverter)

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 7, 2024

CPAL format enumeration is cumbersome so some sort of query function could be welcome (when we can filter formats with some criteria or a predicate). And ideally it should be in CPAL not in rodio.

A crazy implementation would insert all formats into SQLite and query that....

@dvdsk dvdsk changed the title Introduce a public constant for Rodio's default sample rate [Discussion] Minimize unnecessary resampling Dec 8, 2024
@dvdsk
Copy link
Collaborator

dvdsk commented Dec 8, 2024

hope your okay with the rename, you and I have said some valuable stuff here and I want it easier to find. Thats why I renamed it

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 8, 2024

from the user stories thread

Also, I've used the crate Hodaun as an alternative to Rodio in the past for some of the features it has, mostly automation of parameters. Could be a good place to look for desirable features.

from hodaun readme, section differences to rodio:

In Hodaun, the sample rate is passed to Source, and it is up to the Source to decide what to do with it. In practice, this means that the sample rate of Sources adapts to the sample rate of the audio output device.

That sounds like my proposal here: #646 (comment).

and this:

Sources that can have various possible sample rates, like audio input devices or audio files, implement UnrolledSource instead of Source. This trait corresponds more closely to Rodio's Source trait. An UnrolledSource can be converted to a Source using UnrolledSource::resample.

sounds like what you proposed.

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 8, 2024

Yes, I'd also like to pass preferred stream parameters to a source during construction. I do not like the "negotiation" part (if I understand it correctly), that is potentially changing parameters at some point later after construction is complete. I am afraid that may complicate the logic a lot. Sources that do need to change stream parameters dynamically can be adapted with a dedicated format converter that then produces consistent output.

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 8, 2024

Regarding Hodaun, I would also like to process sample frames instead. At least it should be a strict requirement that source always produces whole frames (does not end after passing only some of the channels).

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 10, 2024

that is potentially changing parameters at some point later after construction is complete

It is complex but it would allow us to eliminate resampling when the hardware can take care of it for us. I think the complexity is not that high rodio's sources are a simple tree. The negotiation does not need to be perfect, if it can cover the most common case (song playing through decoder) I think it might be worth it already. Not having to resample will save quite some battery on low power devices.

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 10, 2024

I'll work on it once we have a better resampler in rodio, and we can always decide not te merge. But I think we need to see how it would work out before deciding if this is something we want in rodio. I only have a vague plan right now and most of the complexity will only become clear once I start implementing it.

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 12, 2024

@dvdsk The formats are "suggested", so to say, by both sources and the output device. A source's format can change but stream has to be re-initialized to change that. In my opinion the approach when explicit stream format is passed to the sources is the best (see #646 (comment)). Chasing perfect score does not seem worth it as dynamic changes will complicate the logic. It is not about topology but difficulties similar to handling .current_frame_len() (I still suspect that #493 is caused by this in either rodio's or the user's code) which otherwise can be completely avoided.
Also the questions remain for me. For example, what if we have several mp3 sources each having own format and a particular time how do we handle that? The sources and output stream should vote which format is the best? How do we change output format? By transparently re-opening output stream?

I may change my opinion if I see the prototype, maybe I misunderstand something.

@PetrGlad
Copy link
Collaborator Author

As a side note it is possible to have a generic adapter that helps for any source to handle .current_frame_len(). Namely, it would need to get the wrapped Source constructor function and create new source instance when the input format changes. So it would pass current_frame_len of samples to the wrapped source, and then re-create it with the new input stream format. Maybe it is cumbersome for rodio's embedded sources but may help users.

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 12, 2024

As a side note it is possible to have a generic adapter that helps for any source to handle .current_frame_len(). Namely, it would need to get the wrapped Source constructor function and create new source instance when the input format changes. So it would pass current_frame_len of samples to the wrapped source, and then re-create it with the new input stream format. Maybe it is cumbersome for rodio's embedded sources but may help users.

I do not really get how that would work, is it something you can easily demonstrate with example code?

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 12, 2024

but stream has to be re-initialized to change that.

I did not know that, it makes senses though.

Chasing perfect score does not seem worth it as dynamic changes will complicate the logic.

you are completely right here. Perfect is the enemy of good. I think if we do this, we should do it as a slight optimization on the current situation. Most sources are a constant sample rate we just do not know the rate at compile time. So if we do a single negotiation at creation we can determine a good sample_rate for most use-cases. We can add a method to override easily. We should be able to make that override do what you propose (inject a sample_rate from the output). better (less unneeded resampling) for the vast majority of users.

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 12, 2024

@dvdsk I just tried to write such an adapter for processing spans and it would take some time for me to see how to do this in a generic way. I'll try to have something that at least builds :) Although you can look at BltFilter::next() which uses similar pattern with its self.applier which is re-initialized every time a span starts.

@PetrGlad
Copy link
Collaborator Author

PetrGlad commented Dec 17, 2024

@dvdsk Yes, now I admit that in Rust such an adapter will be a lot more complicated then I thought or maybe I do not understand Rust well enough yet. In a garbage collected language this would be more straightforward. The filter that exists only for a span should be given temporary source instance since it is supposed to own it, and from other side of it there should be some source that can be dynamically attached to another source (when span changes). So I stopped half way :) Maybe will give it another try if I get a clearer idea.

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 18, 2024

in Rust such an adapter will be a lot more complicated then I thought

That is what prototyping is for, you can not figure out everything beforehand, hope you had some fun along the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants