Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the AudioRingBuffer list-based #1064

Merged
merged 7 commits into from
Dec 28, 2022
Merged

Make the AudioRingBuffer list-based #1064

merged 7 commits into from
Dec 28, 2022

Conversation

earlephilhower
Copy link
Owner

The ring buffer worked but had issues with IRQs and the available() procesing. Because it was a pain to debug, move to a linked list setup where there are filled and empty buffers to work from, simplifying the underlying logic.

Allow I2S::available() to return free writing space in OUTPUT mode to make it saner.

@earlephilhower
Copy link
Owner Author

@DatanoiseTV can you give this branch a spin? The whole buffer management was redone and the CB seems to be working as well as the ::available(forWrite) seems stable and should be at least close to clearing up #939 .

#include <I2S.h>

// Create the I2S port using a PIO state machine
I2S i2s(OUTPUT);

// GPIO pin numbers
#define pBCLK 16
#define pWS (pBCLK+1)
#define pDOUT 18

const int frequency = 440; // frequency of square wave in Hz
const int amplitude = 500; // amplitude of square wave
const int sampleRate = 16000; // minimum for UDA1334A

const int halfWavelength = (sampleRate / frequency); // half wavelength of square wave

int16_t sample = amplitude; // current sample value
int count = 0;

void cb() {
  while(i2s.availableForWrite()) {
    if (count % halfWavelength == 0) {
      // invert the sample every half wavelength count multiple to generate square wave
      sample = -1 * sample;
    }
  
    // write the same sample twice, once for left and once for the right channel
    i2s.write(sample);
    i2s.write(sample);
  
    // increment the counter for the next sample
    count++;
  }
}

void setup() {
  pinMode(LED_BUILTIN, OUTPUT);
  digitalWrite(LED_BUILTIN, 1);

  Serial.begin(115200);
  Serial.println("I2S simple tone");

  i2s.setBCLK(pBCLK);
  i2s.setDATA(pDOUT);
  i2s.setBitsPerSample(16);
  i2s.onTransmit(cb);
  // start I2S at the sample rate with 16-bits per sample
  if (!i2s.begin(sampleRate)) {
    Serial.println("Failed to initialize I2S!");
    while (1); // do nothing
  }

}

void loop() {
}

The ring buffer worked but had issues with IRQs and the available()
procesing.  Because it was a pain to debug, move to a linked list
setup where there are filled and empty buffers to work from,
simplifying the underlying logic.

Allow I2S::available() to return free writing space in OUTPUT mode
to make it saner.
Fix typo causing the wrong active buffer to be freed causing infinite looping
when underflow happened and potentially messing up normal operation if the
'freed' buffer was overwritten before being played.
@earlephilhower
Copy link
Owner Author

For some reason, @DatanoiseTV , your comment hit my email box but not the GH repo...

In any case, this current patch was tested using the logic analyzer for 10s using a stream of incrementing samples. Both with a manually forced underflow and w/o any underflow worked for me when I post-processed the CSVs. No missing samples detected and 0s sent on underflow.

I was unable to build your PicoVult sketch because it needs some add'l library or files (`#include "vultin.h") which isn't in the repo.

@DatanoiseTV
Copy link
Contributor

For some reason, @DatanoiseTV , your comment hit my email box but not the GH repo...

In any case, this current patch was tested using the logic analyzer for 10s using a stream of incrementing samples. Both with a manually forced underflow and w/o any underflow worked for me when I post-processed the CSVs. No missing samples detected and 0s sent on underflow.

I was unable to build your PicoVult sketch because it needs some add'l library or files (`#include "vultin.h") which isn't in the repo.

https://github.com/DatanoiseOrg/VultArduino

@earlephilhower
Copy link
Owner Author

I've run the code and it seems to be crashing in DSP code, not in I2S handling. With the optimizer it's hard to tell exactly what's going one but I was able to get a dump by connecting GDB after the hang:

Remote debugging using localhost:3333
__wrap___aeabi_lmul () at /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/rp2_common/pico_int64_ops/pico_int64_ops_aeabi.S:26
26	    adds   r1, r3
(gdb) where
#0  __wrap___aeabi_lmul () at /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/rp2_common/pico_int64_ops/pico_int64_ops_aeabi.S:26
#1  0x100035b4 in fix_mul (y=<optimized out>, x=<optimized out>) at /home/earle/Arduino/hardware/pico/rp2040/libraries/VultArduino/src/vultin.h:85
#2  Demo_env (_ctx=..., input=input@entry=0, decay=decay@entry=6553) at /home/earle/Arduino/PicoADK_I2S_Callback_Demo_Vult/demo.cpp:99
#3  0x10003a76 in Demo_drum (stretch=6553, time=52428, decay=6553, pitch=0, gate=0, _ctx=...) at /home/earle/Arduino/PicoADK_I2S_Callb
   ack_Demo_Vult/demo.h:218
#4  Demo_process (_ctx=..., input=input@entry=0) at /home/earle/Arduino/PicoADK_I2S_Callback_Demo_Vult/demo.cpp:321
#5  0x1000338a in cb () at /home/earle/Arduino/PicoADK_I2S_Callback_Demo_Vult/PicoADK_I2S_Callback_Demo_Vult.ino:25
#6  0x20000152 in AudioRingBuffer::_dmaIRQ (this=0x20017a58, channel=channel@entry=0) at /home/earle/Arduino/hardware/pico/rp2040/libraries/I2S/src/AudioRingBuffer.cpp:262
#7  0x200001d2 in AudioRingBuffer::_irq () at /home/earle/Arduino/hardware/pico/rp2040/libraries/I2S/src/AudioRingBuffer.cpp:269
#8  0x200005fc in irq_handler_chain_slots ()

The volatile was improperly assigned allowing GCC to cache the first
read of the list heads in ::read/::write causing, potentially, an
infinite loop.  Adjust so the data pointed at by the pointer will
be the volatile bit causes GCC to emit re-reads ever pass of the
bust-wait.
@DatanoiseTV
Copy link
Contributor

@earlephilhower Please try again with a fresh https://github.com/DatanoiseOrg/VultArduino
Recently hardware division was added.

@earlephilhower
Copy link
Owner Author

With the latest one what I see is that it is the callback is taking longer than a full DMA cycle and that it's breaking the DMA ping pong structure and no more DMA IRQs happen. Looking at the waveform I can see the Bclk/Wclk freeze which means the PIO isn't getting any FIFO data and stalling. I can repro it by taking the simple CB test and GDB breaking while running. The clocks continue until the final DMA finishes but at that point it's all busted inside.

@DatanoiseTV
Copy link
Contributor

With the latest one what I see is that it is the callback is taking longer than a full DMA cycle and that it's breaking the DMA ping pong structure and no more DMA IRQs happen. Looking at the waveform I can see the Bclk/Wclk freeze which means the PIO isn't getting any FIFO data and stalling. I can repro it by taking the simple CB test and GDB breaking while running. The clocks continue until the final DMA finishes but at that point it's all busted inside.

That is a bit weird, as I am successfully running that code and much much more complex patches in my firmware:https://github.com/DatanoiseTV/PicoADK-Firmware-Template

@DatanoiseTV
Copy link
Contributor

Maybe try setting different buffer sizes?

@earlephilhower
Copy link
Owner Author

Could be, as I am using pretty small default buffer sizes (which are configurable). If sample generation time is linear, though, it may not really change things.

That said, the core should probably be able to recognize and recover from this kind of programming error.

Basically what we have now is DMA 1 fires the trigger to DMA 2 (HW chaining) and cause an IRQ to happen. In the IRQ we reset DMA1 to point to the next bit of data after DMA2 but don't trigger it (DMA2 is wired to trigger DMA1). You service IRQs in time, all works fine. You don't, DMA 2 tries to trigger DMA1 but DMA1 is already done so it's a no-op and no more triggers. You get both IRQs and update DMA addresses, but nobody's going to actually trigger the transfer start. Interesting corner case!

@DatanoiseTV
Copy link
Contributor

DatanoiseTV commented Dec 26, 2022

Could be, as I am using pretty small default buffer sizes (which are configurable). If sample generation time is linear, though, it may not really change things.

That said, the core should probably be able to recognize and recover from this kind of programming error.

Basically what we have now is DMA 1 fires the trigger to DMA 2 (HW chaining) and cause an IRQ to happen. In the IRQ we reset DMA1 to point to the next bit of data after DMA2 but don't trigger it (DMA2 is wired to trigger DMA1). You service IRQs in time, all works fine. You don't, DMA 2 tries to trigger DMA1 but DMA1 is already done so it's a no-op and no more triggers. You get both IRQs and update DMA addresses, but nobody's going to actually trigger the transfer start. Interesting corner case!

I am using between 16 and 32 samples in my examples. But also used a buffer size of 32 and 64 samples today without any issues.

@earlephilhower
Copy link
Owner Author

Looks like stack overflow errors are causing some crashes/weirdness in your code, but with the latest push it's at least not crashing and adjusted to 16 samples @ 32bps.

Here:

void Demo__ctx_type_17_init(Demo__ctx_type_17 &_output_){
   Demo__ctx_type_17 _ctx;
   _ctx.voice2_res = 0x0 /* 0.000000 */;
   _ctx.voice2_pitch = 0x0 /* 0.000000 */;
   _ctx.voice2_gate = 0x0 /* 0.000000 */;
   _ctx.voice2_f = 0x0 /* 0.000000 */;
   _ctx.voice1_pitch = 0x0 /* 0.000000 */;
   _ctx.voice1_gate = 0x0 /* 0.000000 */;
   _ctx.voice1_detune = 0x0 /* 0.000000 */;
   _ctx.time = 0x0 /* 0.000000 */;
   _ctx.feed = 0x0 /* 0.000000 */;
   _ctx.drum_pitch = 0x0 /* 0.000000 */;
   _ctx.drum_gate = 0x0 /* 0.000000 */;
   Demo__ctx_type_16_init(_ctx._inst346);
   Demo__ctx_type_15_init(_ctx._inst21);
   Demo__ctx_type_9_init(_ctx._inst172);
   _output_ = _ctx;
   return ;
}
(gdb) print sizeof(Demo__ctx_type_17)
$13 = 88868

So you're allocating 88KB of data on a 4KB stack and things get weird. Some of the I2S internals get overwritten, it looks like, causing garbage.

Also, what you're doing is illegal C code. You're returning a stack local variable (_ctx) in (_output). You need to malloc() or new that data or else (even ignoring the 88KB size) it's an invalid pointer immediately after the return.

@earlephilhower
Copy link
Owner Author

I would also suggest maybe doing a while (i2s.availableForWrite()) { i2s.write(0); } right after the i2s.begin(). The reason is that your current code, on the 1st call, will attempt to fill all 8 buffers worth of data (since all 8 are free on the 1st CB). After that, it'll only fill one at a time. So the 1st call will be 8x as long and may be a timing issue. By prefilling with silence you'll know you will only fill max 1 full buffer per IRQ.

@earlephilhower earlephilhower merged commit 5ef04da into master Dec 28, 2022
@earlephilhower earlephilhower deleted the redoi2s branch December 28, 2022 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants