Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix filesystems not unmounted at shutdown #219

Conversation

liuming50
Copy link
Contributor

Instead of unconditionally waiting 2 seconds for processes to die,
check continuously for remaining processes, and break the loop when
none remain.

Turn PID 1 to a RT process with highest priority 99 during shutdown,
this ensures it would not be preempted by other RT processes.

Signed-off-by: Robert Andersson robert.m.andersson@atlascopco.com
Signed-off-by: Mathias Thore mathias.thore@atlascopco.com
Signed-off-by: Ming Liu liu.ming50@gmail.com

@liuming50
Copy link
Contributor Author

Hi, @troglobit

I mistakenly closed #209, I created this new MR, please help me review it.

Paste the answers to your question here:

Why not wait for, and kill, the console -sh tty process? They definitely sit on a mounted filesystem (and may have several open files from others), that prevent us from unmounting cleanly

The original author Robert Andersson said he does not remember the exact reason, so I dropped it in V2, it might be some specific requirements from our system.

Passing integers as void * in the callbacks don't work on all archs, pointers and integers are no longer guaranteed to be of the same length. If you only ever want to pass an integer, use an int *

Have changed as you suggested in this MR.

I'm really worried to see the change to SCHED_RR this late in the lifetime of Finit ... do we really need it? Please motivate.

Maybe I can explain this, this is basically for PREEMPT-RT kernel, finit during shutdown must run as a RT process to ensure it would not be scheduled out, so it must run at the highest priority 99. This part of change should also work on non-preempt-rt kernel.

Is the usleep() long enough for the kernel to schedule another task on an old Arm9 with no L2 cache? We have several targets at $DAYJOB that use an old NPX i.MX27 that is soooo slow.

I am not 100% sure about the answer, but we do deploy finit to NXP imx27 boards (maybe thousands of them), I have not seen error report about this change, so 2500 USECS should be long enough I guess.

Copy link
Owner

@troglobit troglobit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I understand a bit better now why you need this change. I'm still not overly happy with the change, but I cannot see a better way to solve this at the moment.

Please address my review comments, in particular the clock_nanosleep() issue, then I'll merge it. If you're quick it'll make it into the next release :)

src/util.c Outdated
deadline.tv_sec += deadline.tv_nsec / 1000000000;
deadline.tv_nsec = deadline.tv_nsec % 1000000000;

while (clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &deadline, NULL) < 0 && errno == EINTR)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clock_nanosleep() API seems to return positive error codes, never negative ones. Yes, I was also surprised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arha, did not notice that, nice catch, will fix it in V2.

src/util.c Show resolved Hide resolved
src/sig.c Outdated
Comment on lines 257 to 258
struct sched_param sched_param;
sched_param.sched_priority = 99;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, empty line after declarations, or sched_param = { .sched_priority = 99 };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that!

src/sig.c Outdated
Comment on lines 260 to 263
/* PID 1 run as RR process with highest priority */
sched_setscheduler(1, SCHED_RR, &sched_param);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is useless as-is. It states in text what the next line does ... which can be read from the code itself. Instead, use comments to explain why a code path is needed. Here's a good place to add the reasoning from the discussion. E.g., "On a PREEMPT-RT system, Finit must run as the highest prioritized RT process to ensure it completes the shutdown sequence."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@troglobit Good point, will do that.

Instead of unconditionally waiting 2 seconds for processes to die,
check continuously for remaining processes, and break the loop when
none remain.

Turn PID 1 to a RT process with highest priority 99 during shutdown,
this ensures it would not be preempted by other RT processes.

Signed-off-by: Robert Andersson <robert.m.andersson@atlascopco.com>
Signed-off-by: Mathias Thore <mathias.thore@atlascopco.com>
Signed-off-by: Ming Liu <liu.ming50@gmail.com>
@liuming50 liuming50 force-pushed the fix-filesystems-not-unmounted-at-shutdown branch from 38586ec to 3e0063e Compare February 7, 2022 09:22
@liuming50
Copy link
Contributor Author

@troglobit Please kindly help me review the V2.

Have fixed all issues as you suggested in V1.

Copy link
Owner

@troglobit troglobit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just as we discussed! :)

@troglobit troglobit merged commit 78069b4 into troglobit:master Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants