Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race condition with creation of throttle sem (needs to exist before CF initializes) #184

Closed
jphickey opened this issue Jan 13, 2022 · 5 comments · Fixed by #342
Closed
Assignees
Milestone

Comments

@jphickey
Copy link
Contributor

Describe the bug
CFE ES starts all applications in their own thread. Therefore, conceptually at least, all apps are starting at the same time.

If configured to use a throttle sem, the CF app expects that semaphore to be created before it starts. During startup, it will attempt to bind to that semaphore during CF_CFDP_InitEngine(), and if that fails, CF aborts (see #178).

Problem is, if the semaphore is created by another app, whether it be CI/TO or some other dedicated I/O app, there is no guarantee that the semaphore has been created before CF attempts to use it.

Secondary problem exists if the I/O app that owns the sem gets restarted or reloaded, the semaphore ID will likely change too. This may be recoverable by disabling the engine and re-enabling it (but haven't tested that).

To Reproduce
Its a race condition, so not readily reproducible.
Start CF before the app that creates the sem (still not guaranteed, but increases the chance the race will be lost)
Add an artificial delay during startup for the app that creates the sem (just further increases the chance the race will be lost)

Expected behavior
Should be guaranteed via sync mechanisms, or shouldn't be a hard error (e.g. maybe retry to bind later?)

Suggestion that CF might still start up but with the engine in a disabled state, so at least someone can correct the condition and enable the engine, rather than having CF abort/exit.

Adding a call to CFE_ES_WaitForStartupSync() before starting the engine might help too...

Code snips

CF/fsw/src/cf_cfdp.c

Lines 1014 to 1015 in 2a024d8

ret = OS_CountSemGetIdByName(&CF_AppData.engine.channels[i].sem_id,
CF_AppData.config_table->chan[i].sem_name);

System observed on:
Ubuntu 21.10

Reporter Info
Joseph Hickey, Vantage Systems, Inc.

@jphickey jphickey added the bug label Jan 13, 2022
@jphickey
Copy link
Contributor Author

calling this a bug because its a race condition, although simply starting CF after the I/O app seems to avoid hitting it most of the time, it's lurking there nonetheless.

@skliper skliper changed the title Race condition with creation of throttle sem Possible race condition with creation of throttle sem (needs to exist before CF initializes) Jul 8, 2022
@jphickey
Copy link
Contributor Author

jphickey commented Aug 2, 2022

I'm running into this issue with the integration of BP + CF ... BP creates the throttle sem, and although CF is started after BP, it still hasn't been created by the time CF checks for it so it fails out. Can hack it to get around the race for now, but we probably need an actual solution to this.

@jphickey
Copy link
Contributor Author

This old issue is becoming a real pain point for BP startup, it hits this race quite often and CF bails out.

I'm going to submit a PR with a possible workaround.

@jphickey jphickey self-assigned this Nov 17, 2022
@skliper
Copy link
Contributor

skliper commented Nov 17, 2022

Just wait until the sem is created? Loop w/ a 1 hz sleep?

@jphickey
Copy link
Contributor Author

Just wait until the sem is created? Loop w/ a 1 hz sleep?

Yes, pretty much.... I had been locally using a patched/hacked version of CF with a sleep but now I'm making some github workflows and it would be much cleaner if there was something in the main line to deal with the sem creation.

jphickey added a commit to jphickey/CF that referenced this issue Nov 17, 2022
Adds a retry loop around OS_CountSemGetIdByName, because if this sem is
created by another app there may be some delay until the other app gets
to the point where it creates the sem.  This works around the race
condition.

A retry limit is also imposed so CF will not spin here forever.
jphickey added a commit to jphickey/CF that referenced this issue Nov 17, 2022
Adds a retry loop around OS_CountSemGetIdByName, because if this sem is
created by another app there may be some delay until the other app gets
to the point where it creates the sem.  This works around the race
condition.

A retry limit is also imposed so CF will not spin here forever.
dzbaker added a commit that referenced this issue Dec 1, 2022
Fix #184, work around throttle sem creation race
@dmknutsen dmknutsen added this to the Draco milestone Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants