-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JACK Host #389
JACK Host #389
Conversation
Stored the callback in a |
Nice one @ErikNatanael ! I'm busy today with some contract work but will keep an eye on this :) |
Heya I'm having a play, and I can't quite figure out the Context: I'm getting an index out of bounds error when I run the beep example on jack. Update: maybe you are using Update2: When I zero the vector it works! :) (BTW one of the rules of rust is that references always have to reference valid data, so you have to zero the buffer. If you're interested |
src/host/jack/stream.rs
Outdated
// in_port_buffers: Vec<&[f32]>, | ||
sample_rate: SampleRate, | ||
input_data_callback: Option<Arc<Mutex<Box<dyn FnMut(&Data) + Send + 'static>>>>, | ||
output_data_callback: Option<Arc<Mutex<Box<dyn FnMut(&mut Data) + Send + 'static>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why mutex why
(The new API is designed to avoid these, and mutexes are really bad for real-time processing anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the strategy is definitely: first make it work, then make it right, and, finally, make it fast." - Stephen C. Johnson and Brian W. Kernighan.
So we should not use Mutex
es when we merge, but we can replace them once we understand the overall structure - we are in the exploration phase at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having a look! The Mutexes definitely need to go, I just don't know how to structure this properly without Rust complaining about the callback not being Sync yet. Let me know if you have any ideas!
@derekdreery
Ahh that's great, I only just got it to compile this morning before work and noticed it actually crashed right away. Thanks for looking into it, I'll commit those changes later on tonight! |
With regards to the issue of needing a |
src/host/jack/stream.rs
Outdated
) -> Self { | ||
|
||
// TODO: Is there a better way than to allocate the temporary buffer on the heap? | ||
// Allocation happens before any audio callbacks so it should be fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ErikNatanael I would say that allocating on the heap for any necessary temp buffers is fine as long as the allocation happens outside of the real-time processing. It's also necessary if the size of the buffer is unknown at compile time.
src/host/jack/stream.rs
Outdated
let num_out_channels = self.out_ports.len(); | ||
|
||
// Run the output callback on the temporary output buffer until we have filled the output ports | ||
for i in 0..current_buffer_size { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be the hot loop I'm guessing. We should try to make it easy for the compiler/OS to use SIMD or memcpy/DMA if possible, but I'm not an expert on such matters. Just for my own understanding, is the issue here that we can't just memcpy
into the jack buffer because the interleaving rules are different? I also don't even know if hardware-accelerated interleaving is possible, again I'm not an expert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, JACK provides one buffer for each port instead of a single buffer with interleaved channels so we have to bridge that. I agree, this will probably waste a lot of cycles unless properly optimised! Unfortunately I have no experience of that kind of optimisation.
A complicating factor is that JACK doesn't seem to guarantee that the number of samples requested is constant between calls to the callback (although there is a maximum buffer size and I'm guessing it sticks to that or close to that unless something special happens, but I'd need to do more research/testing to know for sure) which is why I'm checking if the temporary buffer has run out on every frame. Maybe some optimisation could come out of doing this is chunks instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a bit of an explore and asked on urlo and it seems that there is no better way than a for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I missed this one. That's good to know!
// Write the interleaved samples e.g. [l0, r0, l1, r1, ..] to each output buffer | ||
for ch_ix in 0..num_out_channels { | ||
// TODO: This could be very slow, it would be faster to store pointers to these slices, but I don't know how | ||
// to avoid lifetime issues and allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear you are right about this: running
cargo asm --features jack --rust -- '<cpal::host::jack::stream::LocalProcessHandler as jack::client::callbacks::ProcessHandler>::process'
after installing cargo asm (cargo install cargo-asm
) shows that getting the slice involves at least a call
:
; let output_channel = &mut self.out_ports[ch_ix].as_mut_slice(process_scope);
add rdi, r12
mov rsi, r14
call qword, ptr, [rip, +, _ZN4jack4port5audio75_$LT$impl$u20$jack..port..port..Port$LT$jack..port..audio..AudioOut$GT$$GT$12as_mut_slice17h8bf1202b77c6d417E@GOTPCREL]
and if LLVM isn't inlining, I assume that the function is more than a single op. But super optimization probably isn't the priority for now. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the as_mut_slice
function:
function and asm
impl Port<AudioOut> {
/// Get a slice to write audio data to.
pub fn as_mut_slice<'a>(&'a mut self, ps: &'a ProcessScope) -> &'a mut [f32] {
assert_eq!(self.client_ptr(), ps.client_ptr());
unsafe {
slice::from_raw_parts_mut(
self.buffer(ps.n_frames()) as *mut f32,
ps.n_frames() as usize,
)
}
}
}
asm
jack::port::audio::<impl jack::port::port::Port<jack::port::audio::AudioOut>>::as_mut_slice:
push rbx
sub rsp, 112
mov rax, qword, ptr, [rdi]
mov qword, ptr, [rsp], rax
mov rcx, qword, ptr, [rsi]
mov qword, ptr, [rsp, +, 8], rcx
cmp rax, rcx
jne .LBB113_2
mov ebx, dword, ptr, [rsi, +, 8]
mov rdi, qword, ptr, [rdi, +, 8]
mov esi, ebx
call qword, ptr, [rip, +, jack_port_get_buffer@GOTPCREL]
mov rdx, rbx
add rsp, 112
pop rbx
ret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That cargo-asm thingy is so cool! Thanks for looking into that!
If only pointers were allowed to be null this would be sooo easy :D Maybe std::ptr::null can help us here: initialise enough pointers, point them to all the output buffers and then null them again before ending the callback. Feels a bit unrusty though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raw pointers (that is *const
and *mut
) are allowed to be null, and do not have to point to valid data. However the thing they point to must be valid when they are dereferenced (ptr::read
and ptr::write
).
Things get more complicated if you need to allocate the memory yourself (e.g. using Box::into_raw
), but for getting data from existing pointers allocated elsewhere, *const/mut
and std::ptr
are all you need. Obviously you're on your own when it comes to checking that the thing they point to is valid at the dereference site, hence they require unsafe
.
EDIT The jack port has the Port::buffer
function for getting the raw data. I'll have a play with your PR to see if it's possible. It's also worth profiling, since if there is no difference we may as well not use unsafe
.
EDIT2 pointers must also be aligned when dereferenced, but that is jack's responsibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a demonstration implementation with raw pointers. You can see the difference if you do
# from cpal folder
git remote add derekdreery https://github.com/derekdreery/cpal
git fetch derekdreery
git diff derekdreery/unsafe_jack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant! That works like a charm.
I don't know how to go about profiling this accurately though. Judging by the DSP load meter on qjackctl running either versions results in no noticeable change compared to idle use so maybe it's not such a big performance impact actually. Because this should only scale with number of channels and not amount of DSP or buffer size so the naive version might be fine if we want to avoid unsafe code.
this is what I did to test the jack impluse cpal::{
traits::{DeviceTrait, HostTrait, StreamTrait},
HostId,
};
fn main() -> Result<(), anyhow::Error> {
let host = cpal::host_from_id(cpal::available_hosts()
.into_iter()
.find(|id| *id == HostId::Jack)
.expect(
"make sure --features jack is specified. only works on OSes where jack is available",
)).expect("jack host unavailable");
let device = host
.default_output_device()
.expect("failed to find a default output device");
let config = device.default_output_config()?;
match config.sample_format() {
cpal::SampleFormat::F32 => run::<f32>(&device, &config.into())?,
_ => panic!("only F32 supported on jack"),
}
Ok(())
}
fn run<T>(device: &cpal::Device, config: &cpal::StreamConfig) -> Result<(), anyhow::Error>
where
T: cpal::Sample,
{
let sample_rate = config.sample_rate.0 as f32;
let channels = config.channels as usize;
// Produce a sinusoid of maximum amplitude.
let mut sample_clock = 0f32;
let mut next_value = move || {
sample_clock = (sample_clock + 1.0) % sample_rate;
(sample_clock * 440.0 * 2.0 * 3.141592 / sample_rate).sin()
};
let err_fn = |err| eprintln!("an error occurred on stream: {}", err);
let stream = device.build_output_stream(
config,
move |data: &mut [T]| write_data(data, channels, &mut next_value),
err_fn,
)?;
stream.play()?;
std::thread::sleep(std::time::Duration::from_millis(1_000_000));
Ok(())
}
fn write_data<T>(output: &mut [T], channels: usize, next_sample: &mut dyn FnMut() -> f32)
where
T: cpal::Sample,
{
println!();
for frame in output.chunks_mut(channels) {
let value: T = cpal::Sample::from::<f32>(&next_sample());
for sample in frame.iter_mut() {
*sample = value;
}
}
}
</details> |
The feedback example now works on my system. I can go down to ~10 ms of latency most runs if I accept a couple of overruns at the very start, but it's a bit different between each execution. This would seem to suggest that there is some randomness in relation to when JACK calls the different clients relative to each other. JACK builds a node graph of its clients internally meaning that for two clients where one gets input from the other their order would be correct. I wonder if we can abuse this with dummy ports to force JACK to call the callbacks in the correct order until we have duplex streams. I.e. if cpal_client_in connects to cpal_client_out (with ports that are never used for any audio) JACK might fix the ordering for us. Ofc this would require us to keep track of if both an input and an output stream have been created and if so bridge them using two ports. Cons: this would pollute the JACK connections/patchbay and make it harder to see at a glance which ports are used. |
Do any of the other Hosts provide custom methods for Streams or is there a preferred way to do it? I want to try creating dummy ports to see if that improves latency, the compiler tells me that I have
With type annotations for the streams:
|
Still psyched about this great work. :) |
That's a good idea! Only downside is that host selection becomes quite verbose, mostly because |
Thanks! Hadn't used GitHub CI before, but that was super easy |
Me neither, I usually use travis, but those yaml files are all similar :) |
@ishitatsuyuki friendly ping. Is there anything else blocking this PR from being merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine to me, congratulations on finishing all the TODOs.
Can we have the warnings fixed though? I'll merge as soon as those are fixed.
@ErikNatanael If you'd like I can open a PR with the warnings fixed into your repository. |
That would be fantastic! |
PR to fix warnings, see the comment |
Thank you Psykopear, ishitatsuyuki, derekdreery and mitchmindtree for your help and reviews! |
Has anyone tested this with PipeWire's reimplementation of JACK? |
I'd be interested in a pipewire backend in the future. I've just moved over from jack, but haven't tried using cpal yet. I've found the pipewire (& wayland which has a similar pattern) APIs to be pretty neat, and there is a pipewire-rs project in the pipewire repo. |
The PipeWire maintainer doesn't recommend that applications use the PipeWire API unless they have a specific reason to do so. It should work fine with the JACK API already. |
Oh OK cool didn't know that. |
More context: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/130 I am looking into cpal as a potential replacement for PortAudio in Mixxx because the PortAudio maintainers have been inexplicably dismissive of fixing a critical bug in PortAudio that breaks it when working with PipeWire via the JACK API for many devices. |
I can't answer your question, but I'd be very interested in how you got on with it. I basically got scared of the Jack codebase when I went to look at their "thread safe" ring buffer, which actually isn't thread safe at all AFAICT. But maybe I'm being over pessimistic, also I guess problems in the codebase doesn't mean there is anything bad about the API, which of course is Pipewire in this case. |
But having someone test out the Jack backend would be very useful for surfacing bugs so it would be appreciated by ppl here. :) |
How do I run the examples with this?
|
Is that JACK or PipeWire? I don’t know anything about the codebase but I guess the server gave a really big buffer size. |
It makes no difference if I use the pw-uninstalled.sh script to set LD_LIBRARY_PATH to point to PipeWire's JACK reimplementation or not. |
I opened a new issue to continue this discussion to not drag this old PR too far off topic: #554. |
Pull request to track the JACK Host implementation.
Major TODOs:
LocalProcessHandler
for invocation from the audio thread