-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add process locking #851
Add process locking #851
Conversation
Thank you for contributing to the Leapp project!Please note that every PR needs to comply with the Leapp Guidelines and must pass all tests in order to be mergeable.
To launch regression testing public members of oamg organization can leave the following comment:
Please open ticket in case you experience technical problem with the CI. (RH internal only) Note: In case there are problems with tests not being triggered automatically on new PR/commit or pending for a long time, please consider rerunning the CI by commenting leapp-ci build (might require several comments). If the problem persists, contact leapp-infra. |
5a78eb1
to
ac48f22
Compare
(not a real review yet) Could you please rebase? The tests should be fixed by this |
@oamg/developers There is currently one point of failure I want to discuss, but I am not sure how relevant it is. It is possible that with this change leapp can fail under some extreme circumstances. The only one I can think of now is that leapp fails ( Is it worth considering something like this? In such a case we could implement a flag to instruct leapp to ignore locking in initram. |
@dkubek hmm... thinking whether we could just read it from |
*NOTE: I have some knowledge of locking but have not looked at this PR yet. I'm just responding to one part of this comment.
We ought to be able to determine whether the lockfile was created before the last reboot. As long as this lockfile's purpose is only to ensure leapp is not running twice, then we can use the fact that there has been a reboot since the lockfile's creation to decide that the lockfile is stale. Maybe comparing the lockfile's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to overthink the lock cleanup design here
I am not really sure, right now if it would work for leapp but did you consider opening the SQLite db exclusively? |
Or rather a SQLite db, that will tell you that you can't use it 😁 |
I've looked over the PR now and have a few higher level questions... In the past, I've implemented locks using filesystem atomicity (the key filesystem operation here is I see in this PR that although you are creating a lockfile and adding the PID to it (things which are needed for a filesystem-driven lockfile approach), you're using What I'm wondering is what problems we are trying to avoid by doing that instead of relying on the lock for the length of the process? The traditional problem with Let me know if I'm missing something. [*] The other problem with |
@abadger Thanks for awesome comment! DISCLAIMER: I am very new to this topic and it might soon become obvious
My original idea I was aiming for was exactly something like you are describing using the So correct me if I'm wrong. What you are suggesting to be a better approach is to acquire a filesystem lock, keep it for the whole duration of the process and release it when leapp ends. Am I correct? This would solve all the problems with stale lockfiles |
Correct. Avoiding the stale lockfile problem is what we'd be able to
solve. I think that the known problems with `fcntl()` and `flock()` won't
be triggered by the way we need to use the lock in LEAPP so we can use them
as the primary locking mechanism (correct me if I'm wrong about that,
though).
…On Wed, Mar 20, 2024, 4:43 AM David Kubek ***@***.***> wrote:
@abadger <https://github.com/abadger> Thanks for awesome comment!
*DISCLAIMER: I am very new to this topic and it might soon become obvious*
I see in this PR that although you are creating a lockfile and adding the
PID to it (things which are needed for a filesystem-driven lockfile
approach), you're using fcnt()l (one of the POSIX APIs) for locking.
However, you aren't using that to determine whether another instance of
LEAPP is already running directly, you're just using that to protect the
lockfile while determining whether the lockfile is currently allocated by
another process, to record our ownership of it by writing our PID to it
(these two needs are what link() addresses in the filesystem atomicity
case), and to ensure that no one else takes the lock while we're evaluating
if the lockfile is stale.
My original idea I was aiming for was exactly something like you are
describing using the link() call. Basically, just create a file, write
PID inside and check if it exists. Instead of just using a plain python
open(), I have tried to introduce atomicity. A quick google search
suggested using filesystem locks and then I have basically emulated the
approach of using link() (which I was not aware of) using flock(). Which
resulted in the current implementation.
So correct me if I'm wrong. What you are suggesting to be a better
approach is to acquire a filesystem lock, keep it for the whole duration of
the process and release it when leapp ends. Am I correct? This would solve
all the problems with stale lockfiles
—
Reply to this email directly, view it on GitHub
<#851 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABTCWVR3NHL2UZIQAZAI5DYZFY7PAVCNFSM6AAAAABDUZNVRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBZGM3TOMBSHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks everybody for awesome feedback! This last changes should fix all of the problems noted above. Firstly, as per @abadger 's awesome suggestion, we will not rely on the PID in the lockfile, but instead on the BSD lock itself which we will keep for the duration of the execution. The lockfile has been moved to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code reviewed. I didn't find anything major, just some adjustments that I thing would make the code cleaner.
This commit addresses the potential risk of running multiple instances of Leapp simultaneously on a single system. It implements a simple lock mechanism to prevent concurrent executions on a single system using a simple BSD lock (`flock(2)`). Lock is acquired at the start of the execution and a PID number is stored in lockfile. The PID in lockfile currently has purely informational character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the issues have been resolved so I'm approving this. There is the open question of whether an incorrectly formatted lockfile should warn or error (currently errors) but I don't feel strongly one way or the other and it seems like Petr can see both sides of it as well. I'm okay to leave that as it is and revisit in the future if it is a problem.
/packit build |
## Packaging - Start building for EL 9 in the upstream repository on COPR (oamg#855) ## Framework ### Enhancements - Minor update in the summary overview to highlight what is present in the pre-upgrade report (oamg#858) - Store metadata about actors, workflows, and dialogs inside leapp audit db (oamg#847, oamg#867) ## Leapp (tool) ### Enhancements - Implement singleton leapp execution to prevent multiple running leapp instances on the system in the same time (oamg#851) ## stdlib ### Fixes - Close properly all file descriptors when executing shell commands via `run` (oamg#880) ## Modifications - Code is now Python3.12 compatible (oamg#855)
## Packaging - Start building for EL 9 in the upstream repository on COPR (#855) ## Framework ### Enhancements - Minor update in the summary overview to highlight what is present in the pre-upgrade report (#858) - Store metadata about actors, workflows, and dialogs inside leapp audit db (#847, #867) ## Leapp (tool) ### Enhancements - Implement singleton leapp execution to prevent multiple running leapp instances on the system in the same time (#851) ## stdlib ### Fixes - Close properly all file descriptors when executing shell commands via `run` (#880) ## Modifications - Code is now Python3.12 compatible (#855)
This change addresses the potential risk of running multiple instances of Leapp simultaneously on a single system.
Considerations:
A simple lock mechanism to prevent concurrent executions using a BSD lock has been implemented (see 1).
JIRA: https://issues.redhat.com/browse/OAMG-9827