Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JACK Host #389

Merged
merged 30 commits into from
Oct 1, 2020
Merged

JACK Host #389

merged 30 commits into from
Oct 1, 2020

Conversation

ErikNatanael
Copy link
Contributor

@ErikNatanael ErikNatanael commented Apr 13, 2020

Pull request to track the JACK Host implementation.

Major TODOs:

  • Store the data callback in the LocalProcessHandler for invocation from the audio thread
  • Process data from input ports
  • Accurately report if JACK is available
  • Use the error callback
  • Implement listing devices
  • Remove Arc Mutex wrapper around process callback when Sync requirement removal PR is merged in rust-jack (see Remove Sync requirement from ProcessHandler. (#121) rust-jack#127)

@ErikNatanael ErikNatanael mentioned this pull request Apr 13, 2020
@ErikNatanael
Copy link
Contributor Author

Stored the callback in a Option<Arc<Mutex<Box<dyn FnMut(&Data) + Send + 'static>>>>, I feel like there's got to be a better way, but this compiles.

@mitchmindtree
Copy link
Member

Nice one @ErikNatanael ! I'm busy today with some contract work but will keep an eye on this :)

@richard-uk1
Copy link
Contributor

richard-uk1 commented Apr 14, 2020

Heya I'm having a play, and I can't quite figure out the temp_output_buffer. It seems that the code indexes into it, but its length is never more than 0.

Context: I'm getting an index out of bounds error when I run the beep example on jack.

Update: maybe you are using Vec::with_capacity (which only reserves that much memory - the initial vec has no elements) when you mean vec![0; size] which zeroes the content of the Vec and marks it live.

Update2: When I zero the vector it works! :) (BTW one of the rules of rust is that references always have to reference valid data, so you have to zero the buffer. If you're interested MaybeUninit is a way to work with uninitialized memory.

// in_port_buffers: Vec<&[f32]>,
sample_rate: SampleRate,
input_data_callback: Option<Arc<Mutex<Box<dyn FnMut(&Data) + Send + 'static>>>>,
output_data_callback: Option<Arc<Mutex<Box<dyn FnMut(&mut Data) + Send + 'static>>>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why mutex why

(The new API is designed to avoid these, and mutexes are really bad for real-time processing anyway)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the strategy is definitely: first make it work, then make it right, and, finally, make it fast." - Stephen C. Johnson and Brian W. Kernighan.

So we should not use Mutexes when we merge, but we can replace them once we understand the overall structure - we are in the exploration phase at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having a look! The Mutexes definitely need to go, I just don't know how to structure this properly without Rust complaining about the callback not being Sync yet. Let me know if you have any ideas!

@ErikNatanael
Copy link
Contributor Author

@derekdreery

Update: maybe you are using Vec::with_capacity (which only reserves that much memory - the initial vec has no elements) when you mean vec![0; size] which zeroes the content of the Vec and marks it live.

Update2: When I zero the vector it works! :) (BTW one of the rules of rust is that references always have to reference valid data, so you have to zero the buffer. If you're interested MaybeUninit is a way to work with uninitialized memory.

Ahh that's great, I only just got it to compile this morning before work and noticed it actually crashed right away. Thanks for looking into it, I'll commit those changes later on tonight!

@richard-uk1
Copy link
Contributor

With regards to the issue of needing a Mutex, it is because the cpal traits require that the callback is Send while the jack::ProcessHandler requires that the callback is Send and Sync. I'm not sure I even understand what Send and Sync mean for closures so I need to try to figure that out.

) -> Self {

// TODO: Is there a better way than to allocate the temporary buffer on the heap?
// Allocation happens before any audio callbacks so it should be fine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ErikNatanael I would say that allocating on the heap for any necessary temp buffers is fine as long as the allocation happens outside of the real-time processing. It's also necessary if the size of the buffer is unknown at compile time.

let num_out_channels = self.out_ports.len();

// Run the output callback on the temporary output buffer until we have filled the output ports
for i in 0..current_buffer_size {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be the hot loop I'm guessing. We should try to make it easy for the compiler/OS to use SIMD or memcpy/DMA if possible, but I'm not an expert on such matters. Just for my own understanding, is the issue here that we can't just memcpy into the jack buffer because the interleaving rules are different? I also don't even know if hardware-accelerated interleaving is possible, again I'm not an expert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, JACK provides one buffer for each port instead of a single buffer with interleaved channels so we have to bridge that. I agree, this will probably waste a lot of cycles unless properly optimised! Unfortunately I have no experience of that kind of optimisation.

A complicating factor is that JACK doesn't seem to guarantee that the number of samples requested is constant between calls to the callback (although there is a maximum buffer size and I'm guessing it sticks to that or close to that unless something special happens, but I'd need to do more research/testing to know for sure) which is why I'm checking if the temporary buffer has run out on every frame. Maybe some optimisation could come out of doing this is chunks instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a bit of an explore and asked on urlo and it seems that there is no better way than a for loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I missed this one. That's good to know!

// Write the interleaved samples e.g. [l0, r0, l1, r1, ..] to each output buffer
for ch_ix in 0..num_out_channels {
// TODO: This could be very slow, it would be faster to store pointers to these slices, but I don't know how
// to avoid lifetime issues and allocation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear you are right about this: running

cargo asm --features jack --rust -- '<cpal::host::jack::stream::LocalProcessHandler as jack::client::callbacks::ProcessHandler>::process'

after installing cargo asm (cargo install cargo-asm) shows that getting the slice involves at least a call:

; let output_channel = &mut self.out_ports[ch_ix].as_mut_slice(process_scope);
 add     rdi, r12
 mov     rsi, r14
 call    qword, ptr, [rip, +, _ZN4jack4port5audio75_$LT$impl$u20$jack..port..port..Port$LT$jack..port..audio..AudioOut$GT$$GT$12as_mut_slice17h8bf1202b77c6d417E@GOTPCREL]

and if LLVM isn't inlining, I assume that the function is more than a single op. But super optimization probably isn't the priority for now. :)

Copy link
Contributor

@richard-uk1 richard-uk1 Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the as_mut_slice function:

function and asm
impl Port<AudioOut> {
    /// Get a slice to write audio data to.
    pub fn as_mut_slice<'a>(&'a mut self, ps: &'a ProcessScope) -> &'a mut [f32] {
        assert_eq!(self.client_ptr(), ps.client_ptr());
        unsafe {
            slice::from_raw_parts_mut(
                self.buffer(ps.n_frames()) as *mut f32,
                ps.n_frames() as usize,
            )
        }
    }
}

asm

jack::port::audio::<impl jack::port::port::Port<jack::port::audio::AudioOut>>::as_mut_slice:
 push    rbx
 sub     rsp, 112
 mov     rax, qword, ptr, [rdi]
 mov     qword, ptr, [rsp], rax
 mov     rcx, qword, ptr, [rsi]
 mov     qword, ptr, [rsp, +, 8], rcx
 cmp     rax, rcx
 jne     .LBB113_2
 mov     ebx, dword, ptr, [rsi, +, 8]
 mov     rdi, qword, ptr, [rdi, +, 8]
 mov     esi, ebx
 call    qword, ptr, [rip, +, jack_port_get_buffer@GOTPCREL]
 mov     rdx, rbx
 add     rsp, 112
 pop     rbx
 ret
which calls into `jack.h`. I think I should stop going down the rabbit hole for now - can worry about this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That cargo-asm thingy is so cool! Thanks for looking into that!

If only pointers were allowed to be null this would be sooo easy :D Maybe std::ptr::null can help us here: initialise enough pointers, point them to all the output buffers and then null them again before ending the callback. Feels a bit unrusty though.

Copy link
Contributor

@richard-uk1 richard-uk1 Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw pointers (that is *const and *mut) are allowed to be null, and do not have to point to valid data. However the thing they point to must be valid when they are dereferenced (ptr::read and ptr::write).

Things get more complicated if you need to allocate the memory yourself (e.g. using Box::into_raw), but for getting data from existing pointers allocated elsewhere, *const/mut and std::ptr are all you need. Obviously you're on your own when it comes to checking that the thing they point to is valid at the dereference site, hence they require unsafe.

EDIT The jack port has the Port::buffer function for getting the raw data. I'll have a play with your PR to see if it's possible. It's also worth profiling, since if there is no difference we may as well not use unsafe.

EDIT2 pointers must also be aligned when dereferenced, but that is jack's responsibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a demonstration implementation with raw pointers. You can see the difference if you do

# from cpal folder
git remote add derekdreery https://github.com/derekdreery/cpal
git fetch derekdreery
git diff derekdreery/unsafe_jack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant! That works like a charm.
I don't know how to go about profiling this accurately though. Judging by the DSP load meter on qjackctl running either versions results in no noticeable change compared to idle use so maybe it's not such a big performance impact actually. Because this should only scale with number of channels and not amount of DSP or buffer size so the naive version might be fine if we want to avoid unsafe code.

@richard-uk1
Copy link
Contributor

this is what I did to test the jack impl
use cpal::{
    traits::{DeviceTrait, HostTrait, StreamTrait},
    HostId,
};

fn main() -> Result<(), anyhow::Error> {
    let host = cpal::host_from_id(cpal::available_hosts()
        .into_iter()
        .find(|id| *id == HostId::Jack)
        .expect(
            "make sure --features jack is specified. only works on OSes where jack is available",
        )).expect("jack host unavailable");
    let device = host
        .default_output_device()
        .expect("failed to find a default output device");
    let config = device.default_output_config()?;

    match config.sample_format() {
        cpal::SampleFormat::F32 => run::<f32>(&device, &config.into())?,
        _ => panic!("only F32 supported on jack"),
    }

    Ok(())
}

fn run<T>(device: &cpal::Device, config: &cpal::StreamConfig) -> Result<(), anyhow::Error>
where
    T: cpal::Sample,
{
    let sample_rate = config.sample_rate.0 as f32;
    let channels = config.channels as usize;

    // Produce a sinusoid of maximum amplitude.
    let mut sample_clock = 0f32;
    let mut next_value = move || {
        sample_clock = (sample_clock + 1.0) % sample_rate;
        (sample_clock * 440.0 * 2.0 * 3.141592 / sample_rate).sin()
    };

    let err_fn = |err| eprintln!("an error occurred on stream: {}", err);

    let stream = device.build_output_stream(
        config,
        move |data: &mut [T]| write_data(data, channels, &mut next_value),
        err_fn,
    )?;
    stream.play()?;

    std::thread::sleep(std::time::Duration::from_millis(1_000_000));

    Ok(())
}

fn write_data<T>(output: &mut [T], channels: usize, next_sample: &mut dyn FnMut() -> f32)
where
    T: cpal::Sample,
{
    println!();
    for frame in output.chunks_mut(channels) {
        let value: T = cpal::Sample::from::<f32>(&next_sample());
        for sample in frame.iter_mut() {
            *sample = value;
        }
    }
}

</details>

@ErikNatanael
Copy link
Contributor Author

The feedback example now works on my system. I can go down to ~10 ms of latency most runs if I accept a couple of overruns at the very start, but it's a bit different between each execution. This would seem to suggest that there is some randomness in relation to when JACK calls the different clients relative to each other.

JACK builds a node graph of its clients internally meaning that for two clients where one gets input from the other their order would be correct. I wonder if we can abuse this with dummy ports to force JACK to call the callbacks in the correct order until we have duplex streams. I.e. if cpal_client_in connects to cpal_client_out (with ports that are never used for any audio) JACK might fix the ordering for us. Ofc this would require us to keep track of if both an input and an output stream have been created and if so bridge them using two ports. Cons: this would pollute the JACK connections/patchbay and make it harder to see at a glance which ports are used.

@ErikNatanael
Copy link
Contributor Author

Do any of the other Hosts provide custom methods for Streams or is there a preferred way to do it? I want to try creating dummy ports to see if that improves latency, the compiler tells me that I have cpal::Streams instead of cpal::platform::JackStream
If I don't type annotate I get this error for the actual method I want to call

input_stream.link_to_output_stream(&mut output_stream);
   |                  ^^^^^^^^^^^^^^^^^^^^^ method not found in `cpal::Stream`

With type annotations for the streams:

let input_stream: cpal::platform::JackStream = input_device.build_input_stream(&config, input_data_fn, err_fn)?;
expected struct `cpal::platform::JackStream`, found struct `cpal::Stream`

@richard-uk1
Copy link
Contributor

Still psyched about this great work. :)

@ErikNatanael
Copy link
Contributor Author

It seems that the JACK example is mostly same as the beep example, so as a suggestion, how about merging the JACK example into beep? Then it could take a command line flag like --jack (in addition to the feature gate), so one can test the JACK backend without recompilation.

That's a good idea! Only downside is that host selection becomes quite verbose, mostly because HostId::Jack doesn't exist without the feature flag. I implemented it for both beep and feedback, let me know if you want me to revert.

@ErikNatanael
Copy link
Contributor Author

Yes! It works. Thank you very much for this PR :)

edit: For the CI not building, maybe it's enough to add a step inside .github/workflows/cpal.yml, something like:

    - name: Install libjack
      run: sudo apt-get install libjack

just below line 75 of the file.

I'm not sure about the package name on ubuntu though, it could be libjack-dev

Thanks! Hadn't used GitHub CI before, but that was super easy

@Psykopear
Copy link
Contributor

Thanks! Hadn't used GitHub CI before, but that was super easy

Me neither, I usually use travis, but those yaml files are all similar :)

@Psykopear
Copy link
Contributor

@ishitatsuyuki friendly ping. Is there anything else blocking this PR from being merged?

Copy link
Collaborator

@ishitatsuyuki ishitatsuyuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me, congratulations on finishing all the TODOs.

Can we have the warnings fixed though? I'll merge as soon as those are fixed.

@Psykopear
Copy link
Contributor

@ErikNatanael If you'd like I can open a PR with the warnings fixed into your repository.

@ErikNatanael
Copy link
Contributor Author

@ErikNatanael If you'd like I can open a PR with the warnings fixed into your repository.

That would be fantastic!

@Psykopear
Copy link
Contributor

PR to fix warnings, see the comment

@ishitatsuyuki ishitatsuyuki merged commit 2b31fe4 into RustAudio:master Oct 1, 2020
@ErikNatanael
Copy link
Contributor Author

Thank you Psykopear, ishitatsuyuki, derekdreery and mitchmindtree for your help and reviews!

@Be-ing
Copy link
Contributor

Be-ing commented Mar 12, 2021

Has anyone tested this with PipeWire's reimplementation of JACK?

@richard-uk1
Copy link
Contributor

I'd be interested in a pipewire backend in the future. I've just moved over from jack, but haven't tried using cpal yet. I've found the pipewire (& wayland which has a similar pattern) APIs to be pretty neat, and there is a pipewire-rs project in the pipewire repo.

@Be-ing
Copy link
Contributor

Be-ing commented Mar 12, 2021

The PipeWire maintainer doesn't recommend that applications use the PipeWire API unless they have a specific reason to do so. It should work fine with the JACK API already.

@richard-uk1
Copy link
Contributor

Oh OK cool didn't know that.

@Be-ing
Copy link
Contributor

Be-ing commented Mar 12, 2021

More context: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/130

I am looking into cpal as a potential replacement for PortAudio in Mixxx because the PortAudio maintainers have been inexplicably dismissive of fixing a critical bug in PortAudio that breaks it when working with PipeWire via the JACK API for many devices.

@richard-uk1
Copy link
Contributor

I can't answer your question, but I'd be very interested in how you got on with it.

I basically got scared of the Jack codebase when I went to look at their "thread safe" ring buffer, which actually isn't thread safe at all AFAICT. But maybe I'm being over pessimistic, also I guess problems in the codebase doesn't mean there is anything bad about the API, which of course is Pipewire in this case.

@richard-uk1
Copy link
Contributor

But having someone test out the Jack backend would be very useful for surfacing bugs so it would be appreciated by ppl here. :)

@Be-ing
Copy link
Contributor

Be-ing commented Mar 18, 2021

How do I run the examples with this?

cpal on  master is 📦 v0.13.2 via 🦀 v1.49.0 
❯ cargo run --release --example beep --features jack
    Finished release [optimized] target(s) in 0.03s
     Running `target/release/examples/beep`
Output device: default
Default output config: SupportedStreamConfig { channels: 2, sample_rate: SampleRate(44100), buffer_size: Range { min: 170, max: 1466015503 }, sample_format: F32 }
memory allocation of 5404319552844632832 bytes failed
fish: “cargo run --release --example b…” terminated by signal SIGABRT (Abort)

@ishitatsuyuki
Copy link
Collaborator

Is that JACK or PipeWire? I don’t know anything about the codebase but I guess the server gave a really big buffer size.

@Be-ing
Copy link
Contributor

Be-ing commented Mar 18, 2021

It makes no difference if I use the pw-uninstalled.sh script to set LD_LIBRARY_PATH to point to PipeWire's JACK reimplementation or not.

@Be-ing
Copy link
Contributor

Be-ing commented Mar 18, 2021

I opened a new issue to continue this discussion to not drag this old PR too far off topic: #554.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants