Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faulty DMA Buffer Overflow in adc_trig example #104

Closed
apgoetz opened this issue May 18, 2020 · 17 comments
Closed

Faulty DMA Buffer Overflow in adc_trig example #104

apgoetz opened this issue May 18, 2020 · 17 comments

Comments

@apgoetz
Copy link
Contributor

apgoetz commented May 18, 2020

Hi,

While experimenting with the ADC DMA examples, I discovered a bug in the implementation. I haven't completely root caused it, but I wanted to file an issue to keep track of it.

I assume this bug is part of the DMA implementation in #86

Currently, the DMA peripheral is set up in circular buffer mode to repeatedly transfer ADC samples into a fixed buffer. Sometimes, when the DMA peripheral reaches the end of this buffer, it causes a spurious BufferOverflow error when using the read_available() method.

This only seems to occur when the DMA transfer has reached the end of the buffer space and is about to wrap around to the start. The example adc_trig.rs uses a buffer size of 256 and a sample rate of 1Hz, so it takes a while to surface this bug. If you speed up the sample rate or decrease the buffer size, you can reproduce this issue much more quickly.

The issue seems to be caused by the function check_overrun_inner() which is responsible for checking, based on the current DMA status, if the DMA peripheral has overwritten data that has not been read yet.

It seems that if the DMA write pointer is located at position 0, but the transfer_state.complete flag is not set, this causes a spurious buffer overflow.

The DMA write pointer is calculated from the DMA_CNDTRx.NDT register field, which is supposed to indicate the number of remaining items to transfer. It seems that if this counter has restarted, and is equal to the length of the buffer, but the transfer complete flag is not set, this error condition is triggered.

I am not certain how this can happen: I assume that the transfer complete flag gets triggered by hardware pretty close to when the NDT register reloads its value, but somehow the example code is triggering this behaviour.

@apgoetz
Copy link
Contributor Author

apgoetz commented May 18, 2020

Forgot to mention, I am using a 32L0538DISCOVERY dev board, which is an stm32l0x3 device. In order to get the example to build, I removed the #cfg checks in adc.rs, and the required features from Cargo.toml.

I also had to change the UART peripheral to match the dev board, but I assume these changes shouldn't affect the behavior of the code.

The ADC DMA transfers seem to be working, until the example code eventually panics due to the aforementioned BufferOverflow error.

@dbrgn
Copy link
Contributor

dbrgn commented May 18, 2020

(Offtopic, but what did you need to adjust with the UART peripheral? Did you specify the mcu-STM32L053C8Tx cargo feature?)

@hannobraun
Copy link
Contributor

Thanks for the report! I'll take a look later to see if I can reproduce this with the STM32L0x2 device I developed this with.

@apgoetz
Copy link
Contributor Author

apgoetz commented May 19, 2020

Hi, I pushed my experiments into a new branch and will try to answer your questions:
apgoetz/stm32l0xx-hal/tree/adc-dma-testing

@dbrgn, I have a usb-uart device connected to PB6 and PB7 on my dev board, which requires using USART1 instead of USART2. Just switching the USART from USART2 to USART1 caused a compilation error though:

no method named `usart` found for struct `stm32l0::stm32l0x3::USART1` in the current scope
  --> examples/adc_trig.rs:54:10
   |
54 |         .usart(
   |          ^^^^^ method not found in `stm32l0::stm32l0x3::USART1`
   |
   = help: items from traits can only be used if the trait is in scope
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
   |
11 | use stm32l0xx_hal::serial::Serial1Ext;

However, adding the use stm32l0xx_hal::serial::Serial1Ext; fixed the problem for me.

Could this be another issue? I noticed this trait did not need to be imported if I use USART2.

@apgoetz
Copy link
Contributor Author

apgoetz commented May 19, 2020

@hannobraun , Thanks for implementing this, while testing this out tonight, I got a nice picture that shows that the triggered DMA is essentially working correctly:

psd

That's a plot of the Power spectral density of the raw ADC readings in the buffer. You can clearly see the AC mains hum at 60Hz, as well as the first harmonic aliased at 80Hz.

In order to get this picture, I changed the sample rate to 200Hz instead of 1 Hz, and changed the analog pin to PA5, since on my dev board, PA0 is connected to a button with a pull down.

Also, in order to get these examples to build, I had to removed the required-features lines from Cargo.toml, as well as the #cfg feature flash in adc.rs. This was preventing the code from building for my stm32l0x3 device.

Unfortunately, I am still seeing the BufferOverrun errors, and especially unfortunately, it seems that the exact behavior of when the BufferOverrrun happens depends on whether or not optimizations are turned on. If -Os is used, then the dma transfer can successfully complete multiple times, but if optimization is turned off, then it almost always fails on the final data element of the first transfer.

@apgoetz
Copy link
Contributor Author

apgoetz commented May 19, 2020

Interestingly, if you make the adc_trig.rs example wait until the buffer is half full or completely full to drain it, the issue goes away. This is exposed in the DMA peripheral in the interrupt flags HTIF and TCIF.

  • HTIF means that the transfer is half complete
  • TCIF means the transfer is fully complete

In most use cases, I would expect higher level code to wait for one of these flags to be set true, or wait for an interrupt when these flags are set.

I added helper functions to allow higher level code to see the DMA status flags with new pub functions. I modifed adc_trig.rs to wait for the buffer to be half or completely full, and then there was no issue with running the example for a long period of time.

You can see an example of this here: adc-dma-transfer-wait.

@hannobraun
Copy link
Contributor

Thanks for digging deeper into this, @apgoetz. I haven't had a chance to take a look myself so far, but I hope I can get to it soon.

Also, in order to get these examples to build, I had to removed the required-features lines from Cargo.toml, as well as the #cfg feature flash in adc.rs. This was preventing the code from building for my stm32l0x3 device.

I added these restrictions because I didn't have an STM32L0x3 device to test with, so I figured I'd just restrict the code to what I did test with instead of giving a false sense of security. Since you confirmed that it basically works on STM32L0x3, a pull request to remove these restrictions would be a good idea (maybe after this bug is resolved?).

@hannobraun
Copy link
Contributor

@apgoetz: I can reproduce the error on STM32L0x2 and can also confirm the following part of your analysis:

It seems that if the DMA write pointer is located at position 0, but the transfer_state.complete flag is not set, this causes a spurious buffer overflow.

I've modified check_overflow to return some more information: https://github.com/braun-embedded/stm32l0xx-hal/commit/3d67103960e4be0d60c094f08996c8817e6aa62b

The output I get every time is this:

panicked at 'called `Result::unwrap()` on an `Err` value: BufferOverrun(BufferOverrun { read_pos: 255, write_pos: 0, reason: ReadGreaterThanWrite })', examples/adc_trig.rs:82:34

It seems to me the issue is that the write position wraps, but transfer_complete is not set, which invalidates the assumptions of the code.

I have yet to explore this further, but it sure looks like we have a race condition in Channel::transfer_state:

stm32l0xx-hal/src/dma.rs

Lines 469 to 471 in b8147f1

let isr = dma.isr.read();
let data_remaining = dma.$chfield.ndtr.read().ndt().bits();

First we read the flags (before the write has wrapped), then the write wraps, then we read NDT (which reflects that the write has wrapped). No idea right now how to prevent this. I'll do some thinking and reading, but if anyone has an idea in the meantime, I'd love to hear it.

@apgoetz
Copy link
Contributor Author

apgoetz commented May 20, 2020

@hannobraun , Thanks for the background on the required-features. Once there is a solution for the underlying DMA implementation then we can change the requirements. Also, sorry for the long-winded replies: I am new to the rust language and explaining everything in detail kind of helps with my though processes.

I did some more reading on this topic today. Unfortunately, I think it will be really hard to have a sound interface for any code that accesses a dma buffer while a transfer is ongoing.

A lot of the effort so far has been on determining safe traits to use to describe the buffers that DMA acts on. The state of the art on defining these traits seems to be located at ra-kete/dma-poc.

Most of this effort focuses on non-repeating dma transfers, where the goal is usually to speed up data transfer through a peripheral. There has been much less effort on circular and double buffered dma transfers, where the end goal is to maintain strict timing requirements for real time systems.

I looked at DMA implementations located in other stm32 crates, including stm32f1xx-hal stm32l1xx-hal stm32f7xx-hal stm32l4xx-hal stm32g0xx-hal

Most of these hal crates just provide implementations around for single dma transfers. However, stm32f1xx-hal and stm32l4xx-hal do provide implementations that support circular buffers.

The basic idea is to split the receive buffer into two halves, and then only allow access to the half of the buffer that does not currently have a transfer taking place. This still does not make the code safe, because there is no guarantee that the DMA transfer won't wrap around and clobber the data in the half that you are holding, but it provides the best timing guarantees, since at least the DMA transfer is in the other half of the buffer when you access it.

A detailed explanation of this approach can be seen in teskje/dma-poc#1

Getting back to the topic at hand, I think maybe we should reconsider the API for the ADC-DMA interface. I don't think it really makes sense to expose an iterator over the individual samples in the buffer: Most of the time, you will want to batch up ADC samples and process them all at once. An interface that provides access to a completely filled DMA buffer that is triggered by the half-transfer or transfer complete interrupts might make more sense. It also might make sense to let users choose between a continuous DMA transfer and a one-shot DMA transfer. It is a lot easier to reason about data races if you know the DMA is deactivated when you access the buffer.

@hannobraun
Copy link
Contributor

Thanks for doing all this research, @apgoetz. There has been a lot of movement in the space since I worked on the the DMA code in this crate. It would definitely be good to incorporate all this here, although I don't know if now would be a good time, or if waiting for further progress in the working group's focus project would be beneficial.

All that said, I'm already busier than I'd like, and I'm not available to work on any larger update of the DMA and/or ADC APIs any time soon. I am interested in fixing this bug though, and I hope it can be done with a focused change to the racy method I quoted above.

@hannobraun
Copy link
Contributor

I tried to fix the race condition by reading the two registers in a loop, only returning a result once two reads in a row would be the same. I did this under the assumption/hope that the race condition exists purely in the Rust code, and that the update of the registers by the DMA is atomic.

Turns out, that's not the case. It's totally possible to read the registers twice in a row, get the same result each time, and still have read an inconsistent state (i.e. NDT having wrapped, without the flag having been set).

I'll think some more about this, but right now it looks to me like the existing API can't be supported by the hardware, and we'll have to change the API, as @apgoetz suggests.

@apgoetz
Copy link
Contributor Author

apgoetz commented May 21, 2020

OK. Too bad the quick fix didn't work.

I think i would like a to wait a little bit longer until the best practices around this dma has settled a little bit. Once that has happened, I would be happy to work on a new implementation.

@apgoetz
Copy link
Contributor Author

apgoetz commented May 25, 2020

@hannobraun , I have been working on a new implementation of the dma driver that only supports "oneshot" dma transfers. The idea here is that since the DMA controller can write to the buffer while you are accessing it, it makes sense to have a driver design where a buffer is given to the ADC driver, and only returned once the dma transfer has completed.

You can still have continuous transfers by using two different buffers of samples, and the adc_dma.rs example shows how to do that.

Can you take a look at the branch and see if this is something that be integrated into the crate?

Branch is at apgoetz:adc-dma-testing

@hannobraun
Copy link
Contributor

@apgoetz

I have been working on a new implementation of the dma driver that only supports "oneshot" dma transfers. The idea here is that since the DMA controller can write to the buffer while you are accessing it, it makes sense to have a driver design where a buffer is given to the ADC driver, and only returned once the dma transfer has completed.

I haven't taken a look at the code yet, but that sounds like it would mesh well with bbqueue.

Can you take a look at the branch and see if this is something that be integrated into the crate?

Can you please submit this as a pull request (mark it as a draft, if you don't think it's ready). That makes it easier for me to add comments and track changes you make in response to them.

Also, to clarify, I'm not a maintainer of this crate (although it seems I can merge pull requests, which I wasn't aware of?). In any case, @arkorobotics makes the decisions around here.

@hannobraun
Copy link
Contributor

Correction: Turns out I was a maintainer all along, just nobody told me :-)

@electroCutie
Copy link
Contributor

@apgoetz I took a look at it and my new working theory is that we've got an issue with non atomic reads, however we can work around this by using a stored position instead. Pull request created

With this I get absolutely ZERO improper overruns even right up to the limit where the hot loop has trouble keeping up with the ADC. In my testing it also properly generates overruns when the sampling rate is slightly faster than the loop so this doesn't loose any overruns it would have otherwise generated.

@hannobraun
Copy link
Contributor

Fixed in #127.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants