Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

periph: GPIO drivers are not thread safe #4866

Open
haukepetersen opened this issue Feb 21, 2016 · 30 comments
Open

periph: GPIO drivers are not thread safe #4866

haukepetersen opened this issue Feb 21, 2016 · 30 comments
Assignees
Labels
Area: cpu Area: CPU/MCU ports Area: drivers Area: Device drivers Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@haukepetersen
Copy link
Contributor

From #4830: The implementation of almost all our GPIO drivers is not thread safe. Though we have not seen the effect of this somewhere, there are possibilities that pin configurations (or for some boards also set/clear operations) might be lost. Example are lines like these:

// from stm32f4 gpio.c -> gpio_init()
port->MODER |= (1 << (2 * pin_num));

This is compiled into a non atomic read-modify-store sequence. So if interrupted and the same register is modified by a different thread, this modification will be lost.

It would be good to find a global solution/best practice on how to mitigate this. My first quick thought would be to actually disable interrupts while writing this reg, as these sequences are very small and mutexes would be too much overhead.

NOTE: The same issue might also effect other peripheral driver than GPIO ones, but as most work on a device-level without (many) shared global registers, the chances are not quite as high.

@haukepetersen haukepetersen added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Feb 21, 2016
@haukepetersen haukepetersen self-assigned this Feb 21, 2016
@haukepetersen haukepetersen added this to the Release 2016.03 milestone Feb 21, 2016
@LudwigKnuepfer
Copy link
Member

I would start by deciding whether it should be thread-safe. Right now the documentation does not say anything about this topic.

Also - if it is decided it should be thread-safe, please document the realtime side-effects of the implementation.

@DipSwitch
Copy link
Member

DipSwitch commented Feb 21, 2016 via email

@haukepetersen
Copy link
Contributor Author

Good point. I am not quite sure though which ones to support bit banding, I think not all of the cortexes do...

@LudwigKnuepfer: Yes, they need to be thread safe, otherwise we can always end up with bad behavior as described above. (see also -> #4758). Real-time side effects are almost non existant, as the intervals for which we might have to disable interrupts are very (20 instructions max) short.

@DipSwitch
Copy link
Member

All STM32 >= Cortex-M3 support Bit-Banding :) Freescale to I belief.
On 21 Feb 2016 16:45, "Hauke Petersen" notifications@github.com wrote:

Good point. I am not quite sure though which ones to support bit banding,
I think not all of the cortexes do...

@LudwigKnuepfer https://github.com/LudwigKnuepfer: Yes, they need to be
thread safe, otherwise we can always end up with bad behavior as described
above. (see also -> #4758 #4758).
Real-time side effects are almost non existant, as the intervals for which
we might have to disable interrupts are very (20 instructions max) short.


Reply to this email directly or view it on GitHub
#4866 (comment).

@LudwigKnuepfer
Copy link
Member

Yes, they need to be thread safe, otherwise we can always end up with bad behavior as described above.

This is the invariant of operations which are not thread safe, not a reason why this particular interface should be thread safe. If an interface is known not to be thread safe, it can still be used in a thread-safe manner through external measurements.
I am not saying the interface should or should not be thread-safe, just pointing this out.
So again - there needs to be a decision that we actually want this first ;)

Real-time side effects are almost non existant ...

If they are present they may add up, so please document them unless there is good reason not to do so. I don't understand why you try to argue against this in the first place.

@LudwigKnuepfer
Copy link
Member

see also -> #4758

The relevant part seems to be @haukepetersen writing there is a need for thread-safety of these APIs. I could not see any objections to this proposal in that place at the time of this writing.

@kaspar030
Copy link
Contributor

we could use bit-banding

That does uglify the code a lot...

@kaspar030
Copy link
Contributor

Real-time side effects are almost non existant ...

If they are present they may add up, so please document them unless there is good reason not to do so. I don't understand why you try to argue against this in the first place.

Thing is, there's a lot of work in writing the actual code. Standing in row two and shouting, e.g., "document realtime-sideeffects!" will always be the right thing to do, but at the same time, always be annoying and taking wind out of the effort.

@OlegHahm
Copy link
Member

@kaspar030, so what's your message? People should keep their silence when they observe that changes may have an effect on real-time guarantees? I agree 100% with you that it is sometimes annoying that you want to do one contribution and are suddenly facing several other more or less related issues that slow down the process of your original task. However, I really appreciate people looking thoroughly at these implications and stating them instead of simply waving things through.

Disclaimer: this has little to do with the actual PR here.

@kaspar030
Copy link
Contributor

@kaspar030, so what's your message?

We have ~150 "disableIRQ()" in master, and none of them documents real-time effects. Suddenly insisting on documenting them in a random PR feels - annoying.

I was merely answering the question why @haukepetersen argued against documenting them.

@OlegHahm
Copy link
Member

"Things have always been bad, let's not start to improve them." That's your message then? Seriously?

@kaspar030
Copy link
Contributor

Whatever.

@haukepetersen
Copy link
Contributor Author

@LudwigKnuepfer: I think my answers came out the wrong way, sorry.

regarding real-time side-effects:
I am not opposing to document them, my first feeling was rather in the direction that @kaspar030 stated. But being the first one to start documenting such thing is always a big overhead (how to document it, where to put it, etc...), and currently I just have not the time or nerves to be the one that starts this... But to not loose this topic, how about we open an issue of it's own to address this (for this case but also for all other occurrences of disableISR/restoreISR?!

regarding thread-safety:
I seem to have remembered wrong, I thought there were already more discussion about this in #4758. So let me list my arguments here:

  • same as SPI and I2C, GPIO ports are shared ressources (e.g. pin A.01 is used by driver A, pin A.23 by driver B)
  • telling all drivers to take care themselves (external locking) to do only atomic access to these pins is not practicable
  • forcing all peaces of software that use GPIO to run in a single thread is not feasible
  • introducing a require/release (as for buses) is way to much overhead for a simple 'gpio_set/clear'

@jnohlgard
Copy link
Member

For the set/clear/toggle it seems like most MCUs have special registers for only setting or clearing GPIOs (as opposed to "writing" a value to the pins) to work around this read-modify-write race, which means that it will not require any locking.
I think the problem here is when two threads for different devices/applications try to reconfigure pins in the same port simultaneously since that can cause a read-modify-write cycle to interrupt in the middle and the interrupting write will be overwritten after returning control to the original thread.

Suggested plan:

  • Ensure that all GPIO drivers for platforms which support non-locking pin set/clear are actually using these registers instead of output |= (1 << pin);
  • Add either implicit locking (mutex inside the gpio functions) or disable IRQs for initialization and reconfiguration functions which require RMW access to hardware registers.

Cortex-M3/M4 bit banding can be used to avoid locking for single bit flips but the code becomes quite long if you need to modify more than one or two bits.

@LudwigKnuepfer
Copy link
Member

Sure, we can split the realtime side-effects documentation to a separate issue. Maybe it's more purposeful to provide some tool for the task of identifying the possible side-effects anyways.

@LudwigKnuepfer
Copy link
Member

@kaspar030 did create an issue already:
#4872

@haukepetersen
Copy link
Contributor Author

won't be addressed for the current release -> postponed (though still highly relevant)

@haukepetersen haukepetersen modified the milestones: Release 2016.07, Release 2016.04 Apr 8, 2016
@OlegHahm
Copy link
Member

OlegHahm commented Apr 8, 2016

Did you make a note somewhere to put into the known issues for the release notes?

@haukepetersen
Copy link
Contributor Author

put it into the release note draft

@kYc0o
Copy link
Contributor

kYc0o commented Jul 11, 2016

I'm afraid that we need to postpone this again, anyone knows if the new GPIO API takes into account this issue? Maybe @PeterKietzmann ?

@PeterKietzmann
Copy link
Member

Not that I know. IMO the issue is not about the API design but more about (i) the driver implementation itself and/or (ii) documentation about the programming model

@PeterKietzmann PeterKietzmann removed this from the Release 2016.07 milestone Jul 12, 2016
@kYc0o kYc0o added this to the Release 2016.10 milestone Jul 13, 2016
@aabadie aabadie removed this from the Release 2017.04 milestone Jun 21, 2017
@kYc0o
Copy link
Contributor

kYc0o commented Jul 20, 2018

This issue would affect gpio expanders, isn't it @ZetaR60 ?

@ZetaR60
Copy link
Contributor

ZetaR60 commented Jul 22, 2018

@kYc0o Thanks for the heads up on this. For the gpio_exp interface in #9190 gpio_toggle is affected because it is implemented as a read followed by a write to reduce the number of callbacks stored. For other gpio_exp functions, it depends on the implementation of the specific GPIO expander (#9054 is affected).

@ZetaR60
Copy link
Contributor

ZetaR60 commented Aug 29, 2018

The issue with the extension API is fixed in #9860.

@OlegHahm
Copy link
Member

Should we close this one?

@ZetaR60
Copy link
Contributor

ZetaR60 commented Sep 17, 2018

Note: My statement above only pertains to the GPIO extension API (which is not yet merged), and not to GPIO drivers in general.

@stale
Copy link

stale bot commented Aug 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019
@stale stale bot closed this as completed Sep 10, 2019
@aabadie aabadie reopened this Sep 21, 2019
@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Sep 21, 2019
@MrKevinWeiss MrKevinWeiss added the State: don't stale State: Tell state-bot to ignore this issue label Nov 22, 2019
@miri64 miri64 added this to the Release 2020.07 milestone Jun 30, 2020
@aabadie aabadie added Area: drivers Area: Device drivers Area: cpu Area: CPU/MCU ports and removed State: don't stale State: Tell state-bot to ignore this issue labels May 20, 2021
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
@maribu
Copy link
Member

maribu commented Sep 16, 2022

GPIO LL is thread safe and clearly communicates the requirements. Once periph/gpio is replaced with an API compatible high level GPIO API (building upon GPIO LL and external GPIO extender drivers), the issue will be solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports Area: drivers Area: Device drivers Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests