Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for the temporary udev before finit exited it #96

Closed
xhebox opened this issue Jan 19, 2018 · 29 comments
Closed

Wait for the temporary udev before finit exited it #96

xhebox opened this issue Jan 19, 2018 · 29 comments
Milestone

Comments

@xhebox
Copy link
Contributor

xhebox commented Jan 19, 2018

my sound card did not work after i updated to 3.1-rc2.

In short, it's a bug of 00fe371. Finit exited the temporary udevd after setting up the devices tree, this is fine. But the point is, finit exited udevd before all his work were done. You need to use settle to wait. In detail, his work is not limited to /dev, also /run/udev. There're lots of files to generate, If i am correct, temporary udevd exited before all files were generated.

My hot-fix patch is more easier to understand, just a proof of concept. Since there has been lots of new changes in code base, i am not sure if the code is suitable or not. I'd like to leave the work to you :).

--- a/src/finit.c	2018-01-19 14:52:57.765287302 +0000
+++ b/src/finit.c	2018-01-19 14:52:41.975333534 +0000
@@ -409,8 +409,5 @@
 		if (udev && whichp("udevadm")) {
 			run("udevadm trigger --action=add --type=subsystems");
 			run("udevadm trigger --action=add --type=devices");
-			run("udevadm settle --timeout=120");
-
-			/* Tell temporary udevd to exit, we'll start a monitored instance later */
-			run("udevadm control --exit");
+			run("udevadm settle --timeout=120 && udevadm control --exit");
 		}
 	}

EDIT: actually, it's pulseaudio that did not work

@troglobit
Copy link
Owner

Sorry for the late reply, but I didn't find the time to look into this in detail during the weekend. Only did a quick test on a Debian system, which looked OK. But it didn't have have pulseaudio ...

However, the above diff should not have any huge impact, except for udevadm control --exit not being called if the settle command fails. The run() function waits for completion before returning, so the only difference in the original is not checking the return value. Other than that they are functionally equivalent.

If I understand it correctly, udevd not only creates the files in /dev/ and /run/udev, but also calls modprobe to install sound drivers etc. But 120 sec should really be enough time to perform all that. I've compared how the SysV init start scripts for udevd work with the Finit code, and I cannot find any major differences, but I might have overlooked something.

I'll try to investigate more later today.

@xhebox
Copy link
Contributor Author

xhebox commented Jan 22, 2018

@troglobit OK, i removed the patch, trying to give a try, and it magically works now. If it's the case said by you(run is sync), then maybe i should check my laptop hardware. They may break down :(, like poor contact or so. Really sorry for bothering you, i'll take some more time on testing. Hope my reply is not too late :), i'll reopen the issue if i found a more reliable evidence on this issue.

@xhebox xhebox closed this as completed Jan 22, 2018
@troglobit
Copy link
Owner

No problem, at all! :-)

This just prompted my doing some more testing on Debian, and as a result I found some other bugs. So all in all just a good thing, really!

@xhebox
Copy link
Contributor Author

xhebox commented Jan 24, 2018

@troglobit

The udev rules are only applied when a device is added. If you want to reapply the rules to a device that is already connected, you need to do this explicitly, by calling udevadm trigger with the right options to match the device(s) whose configuration has changed, e.g. udevadm trigger --attr-match=vendor='Yoyodyne' --attr-match=model='Frobnicator 300'.

https://unix.stackexchange.com/questions/39370/how-to-reload-udev-rules-without-reboot

I think this is what happened, i might have done sth when i removed the patch, so i got the sound by accident. Now i went silence again, i rebooted four times, both alsa,pulse did not work. And all worked after trigger. It makes sense that udevd service treats populated devices as the connected and does not apply rules for them, and i need to trigger it manually.

The simplest solution i can come up with is running trigger after service started, does it sound possible?

EDIT: due to the complex combinations of different hardware cases, this may not be reproduced properly. But i can prove drivers are ok:

snd 73728 8 snd_hda_intel,snd_hwdep,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek,snd_pcm

while no cards for pulse:

I: [pulseaudio] module-udev-detect.c: Found 0 cards.
I: [pulseaudio] module.c: Loaded "module-udev-detect" (index: #5; argument: "").

and no sounds with mpg123 and firefox, except i ran trigger, and sounds came without rebooting pulse.

EDIT2: when i hot reboot my laptop, sound may appear without manually triggering it. But when i reboot -p, wait until poweroff, and start again, always no sounds. That's probably why i got sound by change these days.

@xhebox xhebox reopened this Jan 24, 2018
@troglobit
Copy link
Owner

Aha, I see so issuing a udevadm control --reload-rules && udevadm trigger when the "proper" udevd has started should do the trick, or would it perhaps suffice to call it before udevadm control --exit"?

@xhebox
Copy link
Contributor Author

xhebox commented Jan 26, 2018

@troglobit not enough, before udevadm exits, that's what finit is doing now. Based on the information i collected, udevd wont add devices automatically if it's not a new pluged-in device. New udevd should always try to trigger(add subsystems and add devices), to "coldplug" all things to ensure the correct environment.

And of course, the best solution should be early starting udevd to prevent from plugging new devices(anyway, it's possible) just in the middle of restarting udevd.

Maybe consider reordering the bootstrap. Like, remount fs/fstab after signal set up, then manually start udevd service(with cmd and its id) just before starting all other services. Since finit itself could execute normally even without much devices in /dev, we dont need to be so hurry about mdev/udev after all. Btw, systemd is mounting first, udevd second.

@troglobit
Copy link
Owner

You're right, I must look into this in detail, and possibly even reorder parts of the bootstrap a bit, to fix this.

To start with, I really need to find a way to reproduce your issues reliably. I've tried only Debian thus far, with no sound issues. I'll try Alpine later today, or tomorrow, and if that doesn't help I guess I'll have to give https://github.com/xhebox/noname-linux a try :)

Thank you so much for investigating this, I hope I can have a fix for you soon!

@xhebox
Copy link
Contributor Author

xhebox commented Jan 26, 2018

@troglobit better alpine, my distro is not so actively maintained now as i am busy with studying. The only detail we do not know is that why hot reboot does not need to retrigger? But it's difficult to figure out that, because we do not even know where the code is.

Whatever, i think early start is a good solution. Even you cant reproduce the issue, it can be sold as an improvement, because there's a risk of missing devices plugged when there's no udevd or udev service. udev really should not be restarted once other applications start to run, also should be started before all other applications(except finit itselt).

@xhebox
Copy link
Contributor Author

xhebox commented Feb 12, 2018

@troglobit have you tested it? I am not here to urge you. I just come up with an idea. I think finit could add a new function that could move the later registered udevd service(service in config is registered first) to the top of list. Same as the other two commands(maybe registered as two 'run'). Or simply a flag of service_register() that registered service to the top of list. Then we dont need to reorder or break the service system.

Also, it will be cool if multi commands service is supported. Then we dont need two tasks, too. :)

EDIT: maybe we could simply leave the udev/mdev into the config file, completely seeing it as an normal service.

EDIT2: I have 9 days off, i can look at it much deeper. Hope we could solve it as early as possible.

@troglobit
Copy link
Owner

troglobit commented Feb 12, 2018

@xhebox Hi! Unfortunately I've been incredibly busy with other things the last couple of weeks 😞

Your proposal is interesting. Please look into it if you can, then we can discuss it in more detail. I know we need mdev run early, to trigger FPGA and other firmware load sequences that in turn cause other drivers to be loaded. But the udev mess I think need to be sorted out in some other fashion.

Meanwhile I'll try to get back to setting up Alpine with udev to see if I can reproduce the problem.

EDIT: By "we" I mean the company that I work for, which sponsors a lot of the work on Finit :)

@troglobit
Copy link
Owner

Status update: I've had no luck trying to run pulseaudio on Alpine Linux, mostly because it's not included (yet) in the main archives, even on testing/edge. However, I've tested Debian (Buster) extensively and experienced no problems with udev + pulseaudio.

From what I can gather, pulseaudio is not started as s system service but rather a per-session user service, usually when starting X, e.g. via gnome-session or when an application like firefox starts up to play sound and is linked with the pulseaudio library.

Sorry I cannot be of more help.

@xhebox
Copy link
Contributor Author

xhebox commented Feb 18, 2018

@troglobit It's ok. :) Thanks. I'll handle this.

@xhebox
Copy link
Contributor Author

xhebox commented Feb 23, 2018

@troglobit I am busy too..

@@ -44,11 +44,12 @@
  * @cmd:  External program to call, or 'internal' for internal inetd services
  * @id:   Instance id
  * @type: Service type, one of service, task, run or inetd
+ * @top:  Add to the top of others or not
  *
  * Returns:
  * A pointer to a new &svc_t object, or %NULL if out of empty slots.
  */
-svc_t *svc_new(char *cmd, int id, int type)
+svc_t *svc_new(char *cmd, int id, int type, int top)
 {
 	int job = -1;
 	char *desc;
@@ -81,7 +82,8 @@
 		desc = cmd;
 	strlcpy(svc->desc, desc, sizeof(svc->desc));
 
-	TAILQ_INSERT_TAIL(&svc_list, svc, link);
+	if (top) TAILQ_INSERT_HEAD(&svc_list, svc, link);
+	else TAILQ_INSERT_TAIL(&svc_list, svc, link);
 
 	return svc;
 }

It's working if i made another flag there. This hot-patch makes sense. And here's my conclusion.

  1. mdev/udev] of course, should be started as early as possible and just keep running. we need to make sure udevd the first service to start. And after tests, i suspect that my pulse did not work because my X&pulse&etc.. started even faster than the new udevd accordingly, i can confirm it 80%. So, no more restart.

  2. The conf file is loaded before we registered it. That is, initramfs has mounted the root and finit just loaded services in config before we registered udev. So, we could 'top the service' at the time.

@xhebox
Copy link
Contributor Author

xhebox commented Feb 24, 2018

@troglobit Another solution:

--- a/src/finit.c	2018-01-23 10:29:58.000000000 +0000
+++ a/src/finit.c	2018-01-23 10:29:58.000000000 +0000
@@ -382,6 +382,7 @@
 			touch("/dev/mdev.log");
 
 		snprintf(cmd, sizeof(cmd), "%s -s", path);
+		run_interactive(cmd, "Populating device tree");
 	} else {
 		/* Desktop and server distros usually have a variant of udev */
 		path = which("udevd");
@@ -391,28 +392,28 @@
 			udev = 1;
 
 			/* Register udevd as a monitored service, started much later */
-			snprintf(cmd, sizeof(cmd), "[12345] %s -- Device event manager daemon", path);
+			snprintf(cmd, sizeof(cmd), "[S12345] %s -- Device event manager daemon", path);
 			if (service_register(SVC_TYPE_SERVICE, cmd, global_rlimit, NULL)) {
 				_pe("Failed registering %s", path);
 				udev = 0;
 			}
+
+			snprintf(cmd, sizeof(cmd), ":1 [S12345] udevadm trigger -c add -t devices -- Devices trigger");
+ 			if (service_register(SVC_TYPE_RUN, cmd, global_rlimit, NULL)) {
+ 				_pe("Failed registering %s", path);
+ 				udev = 0;
+ 			}
+
+			snprintf(cmd, sizeof(cmd), ":2 [S12345] udevadm trigger -c add -t subsystems -- Subsystems trigger");
+ 			if (service_register(SVC_TYPE_RUN, cmd, global_rlimit, NULL)) {
+ 				_pe("Failed registering %s", path);
+ 				udev = 0;
+ 			}
 
-			/* Start a temporary udevd instance to populate /dev  */
-			snprintf(cmd, sizeof(cmd), "%s --daemon", path);
 		}
 	}
 
 	if (path) {
 		free(path);
-
-		run_interactive(cmd, "Populating device tree");
-		if (udev && whichp("udevadm")) {
-			run("udevadm trigger --action=add --type=subsystems");
-			run("udevadm trigger --action=add --type=devices");
-			run("udevadm settle --timeout=120");
-
-			/* Tell temporary udevd to exit, we'll start a monitored instance later */
-			run("udevadm control --exit");
-		}
 	}
 

Update:

  1. those service are registered before udevd because i wrote [S...].
  2. If you think it's ok to let udevd just start in [S] level, this could be a better solution. If you think we need to ensure udevd/BOOTSTRAP services absolutely registered before all other services(one may write other [S] service in this file). Then we have several ways to go now:
  • add another 'svc_list'
  • add another runlevel ahead of 'S'
  • top the service
    Since all service in [S] is essential, there maybe a risk that one needs triggered udevd to set up devices.
  1. Restarting udev needs a special attention. We need to rerun the other trigger commands, either. But because udev just simply should not die, we could ignore this in my view.

@troglobit
Copy link
Owner

Great analysis, having certain prioritized services start before everything else is potentially a great idea to solve issues like this!

I'll have a look in detail tomorrow, but [S] in general and sm_step() seems like the way to go here. I like the way you've worked around having a temporary udevd, that always seemed sub-optimal to me too.

@troglobit
Copy link
Owner

@xhebox Hopefully 2ec132c fixes your problem! I noticed you simplified the patch a bit further, focusing only on registering service + runtasks, it was really helpful.

Please reopen this issue should the problems persist.

@xhebox
Copy link
Contributor Author

xhebox commented Mar 1, 2018

@troglobit

  1. if i am correct, runlevel 6 is for reboot, so there's no six, right?
  2. without ':1, :2', two trigger commands will be registered as one, the second one will be overrided.
  3. [S] for udevadm did not work. It printed like this:
trigger
trigger
mounted
daemon

For [S12345], finit works, but somehow runs RUN&TASK first and second. It printed like this:

trigger
trigger
mounted
daemon
trigger
trigger

Looks like daemon somehow did not start in the runlevel [S]... By the way, my runlevel is two.
4. Please give me the permission to reopen issues, you need to change the options of your repo. :)

@troglobit troglobit reopened this Mar 1, 2018
@troglobit
Copy link
Owner

troglobit commented Mar 1, 2018

  1. Correct, 6 is for reboot (0 is for shutdown) so no point in running there. Usually services like watchdogd and similar are the only one(s) allowed to run there
  2. Ah good catch, my bad! fixing
  3. It should work with only [S] for udevadm calls, Finit should run each registered service/task/run at least once in each runlevel ... I didn't see the point in having udevadm trigger for every runlevel you change to. I'll look into that, unless 2. fixes that too.
  4. I see, thanks for the heads-up Edit: seems GitHub doesn't support this, sorry about that.

@troglobit troglobit added this to the v3.2 milestone Mar 1, 2018
troglobit added a commit that referenced this issue Mar 1, 2018
Signed-off-by: Joachim Nilsson <troglobit@gmail.com>
@xhebox
Copy link
Contributor Author

xhebox commented Mar 1, 2018

@troglobit If you say, udevd is started in the [S] level, then the problem is why task is started ahead of the daemon...

@troglobit
Copy link
Owner

Just pushed a fix for 2., but I'm starting to think we need to add a condition for the tasks so they wait for the daemon to start. I'll have to look into that in more detail, but in theory it could look like this:

:1 [S] <svc/usr/sbin/udevd> udevadm trigger ...
       ~~~~~~~~~~~~~~~~~~~~

I'll get right back to this later today!

@xhebox
Copy link
Contributor Author

xhebox commented Mar 1, 2018

@troglobit something like this worked, but it's a quite slow starting. After "Device event...", it took a long time before the next line "trigger". Any idea?

+		/* Register udevd as a monitored service */
+		snprintf(cmd, sizeof(cmd), "[S12345789] pid:udevd %s -- Device event manager daemon", path);
+		if (service_register(SVC_TYPE_SERVICE, cmd, global_rlimit, NULL)) {
+			_pe("Failed registering %s", path);
+			udev = 0;
+		} else {
+			snprintf(cmd, sizeof(cmd), ":1 [S] <svc/%s> udevadm trigger --action=add --type=subsystems -- 1", path);
+			service_register(SVC_TYPE_RUN, cmd, global_rlimit, NULL);
+
+			snprintf(cmd, sizeof(cmd), ":2 [S] <svc/%s> udevadm trigger --action=add --type=devices -- 2", path);
+			service_register(SVC_TYPE_RUN, cmd, global_rlimit, NULL);
+		}
 		free(path);

@troglobit
Copy link
Owner

@xhebox Yes, exactly like that. Good work!

The reason it takes a while for it to start the triggers is possibly because there is a lot to do at boot and the conditions does not start the runtasks until we enter the Finit event loop ... that one could be a bit trickier to solve. I'll give it some thought, maybe we can make an explicit call to run the state machine an extra time before entering the event loop.

@troglobit
Copy link
Owner

Spent a couple of hours tonight (CET) trying to reproduce the issue you see in my Debian setup.

  • Good news: I can reproduce all of it, including the sluggishness in starting up
  • Bad news: I currently have no idea why this happens

I will continue debugging for another hour, and continue tomorrow evening.

troglobit added a commit that referenced this issue Mar 1, 2018
We must wait for udevd to be started before we call udevadm.  However,
udevd doesn't create a PID file, so we must also fake this.  There is
still a slight risk of a race condition: udevd not having properly
started before udevadm tries to connect, but it works OK in Debian.

Signed-off-by: Joachim Nilsson <troglobit@gmail.com>
troglobit added a commit that referenced this issue Mar 1, 2018
All service/task commands are tracked by the state machine, but run
commands are a bit special since they must run in sequence.  Hence,
we must increment their 'once' counter in service_start() instead.

This should fix the issue mentioned in #96 "took a long time before
the next line" -- because Finit was waiting for udevadm to reach a
once count > 0, which we didn't increment.

Signed-off-by: Joachim Nilsson <troglobit@gmail.com>
@troglobit
Copy link
Owner

troglobit commented Mar 1, 2018

Yes, I think I've found it! 😃

  1. I've added the condition tracking to the udevadm commands
  2. I discovered that we had missed incrementing the 'once' counter for run commands

With 2. in place the boot is really quick on my little Debian system. Makes me wonder if we should consider re-introducing the udevadm settle -t 120 as a third run command?

(Off to get some sleep ...)

@xhebox
Copy link
Contributor Author

xhebox commented Mar 2, 2018

@troglobit I'd say no. While it makes udevd completely and safely loaded, it may takes a long time. This is a problem complained by too many systemd users, and now it's disabled by default in some distros. I think we'd better leave users free, hardcode is not a good idea. If you really want to reintroduce it, i think you can make it a option in finit.conf.

EDIT: patch tested

@troglobit
Copy link
Owner

Agreed. Better leave it as-is.

Did the patch work for you too?

@xhebox
Copy link
Contributor Author

xhebox commented Mar 2, 2018

@troglobit working

@troglobit
Copy link
Owner

@xhebox That's great news! 😃👍

Sorry for taking so long to resolve this. Is there anything else you feel we should do before closing this issue?

@xhebox
Copy link
Contributor Author

xhebox commented Mar 2, 2018

@troglobit It's ok. And no, let me close the issue.

@xhebox xhebox closed this as completed Mar 2, 2018
@troglobit troglobit modified the milestones: v3.3, v3.2 Apr 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants