Skip to content
This repository has been archived by the owner on Oct 11, 2023. It is now read-only.

/var/run/utmp is a directory ? #2676

Closed
jbrockopp opened this issue Feb 14, 2019 · 15 comments
Closed

/var/run/utmp is a directory ? #2676

jbrockopp opened this issue Feb 14, 2019 · 15 comments

Comments

@jbrockopp
Copy link

jbrockopp commented Feb 14, 2019

RancherOS Version: (ros os version)

[rancher@server ~]$ sudo ros -v
version v1.4.0 from os image rancher/os:v1.4.0

Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.)

OpenStack

I think I found a bug so I'm opening this to get confirmation of expected behavior in regards to /var/run/utmp is no longer a file but a directory if you switch your console from the default.

I have two servers, one with the default console enabled and one with the centos console enabled.

Default Console:

[rancher@server ~]$ sudo ros console list
disabled alpine
disabled centos
disabled debian
current  default
disabled fedora
disabled ubuntu

[rancher@server ~]$ sudo ros -v
version v1.4.0 from os image rancher/os:v1.4.0

[rancher@server ~]$ ls -al /var/run/utmp
-rw-r--r--    1 root     root           768 Feb 14 15:58 /var/run/utmp

Centos Console:

[rancher@server ~]$ sudo ros console list
disabled alpine
current  centos
disabled debian
disabled default
disabled fedora
disabled ubuntu

[rancher@server ~]$ sudo ros -v
version v1.4.0 from os image rancher/os:v1.4.0

[rancher@server ~]$ ls -al /var/run/utmp/
total 0
drwxr-xr-x  2 root root  40 Jan 20 00:48 .
drwxrwxrwt 21 root root 680 Jan 20 02:25 ..

Is it expected behavior that when you switch your console to something besides default, it converts the /var/run/utmp to a directory? Or is this a bug?

I'll post in the next comment how this is affecting me.

@jbrockopp
Copy link
Author

The issue is we're deploying Telegraf as a docker container as a part of our cloud-config.

Here is the relevant configuration:

    telegraf:
      container_name: telegraf
      image: telegraf:1.9.4-alpine
      restart: always
      volumes:
        - /home/rancher/telegraf:/etc/telegraf:ro
        - /sys:/rootfs/sys:ro
        - /proc:/rootfs/proc:ro
        - /var/run/utmp:/var/run/utmp:ro
        - /var/run/docker.sock:/var/run/docker.sock:ro

The issue this produces in the logs is:

2019-01-30T22:17:50Z E! Error in plugin [inputs.system]: read /var/run/utmp: is a directory
2019-01-30T22:18:00Z E! Error in plugin [inputs.system]: read /var/run/utmp: is a directory

We care about the system input for Telegraf because we like to gather the load averages from our servers. I haven't had much luck finding a workaround on the internet but I'll continue looking.

I figured I should first clarify if this is expected behavior.

@Jason-ZW
Copy link

Jason-ZW commented Feb 15, 2019

@jbrockopp I use the same steps on docker-machine virtualbox and cannot reproduce this issue.

Using default console to startup telegraf:

[root@rancheros cloud-config.d]# ros -v
version v1.4.0 from os image rancher/os:v1.4.0

[root@rancheros cloud-config.d]# ros console list
disabled alpine
disabled centos
disabled debian
current  default
disabled fedora
disabled ubuntu

[root@rancheros cloud-config.d]# ls -la /var/run/
total 492
drwxrwxrwt    9 root     root           480 Feb 15 03:46 .
drwxr-xr-x    1 root     root          4096 May 31  2018 ..
srw-rw-rw-    1 root     root             0 Feb 15 03:46 acpid.socket
-rw-------    1 root     root             0 Feb 15 03:46 agetty.reload
drwxr-xr-x    2 root     root            80 Feb 15 03:45 blkid
-rw-r--r--    1 root     root             7 Feb 15 03:46 console-done
drwxr-xr-x    3 root     root            60 Feb 15 05:46 dhcpcd
-rw-r--r--    1 root     root             5 Feb 15 03:46 dhcpcd.pid
srw-rw----    1 root     root             0 Feb 15 03:46 dhcpcd.sock
srw-rw-rw-    1 root     root             0 Feb 15 03:46 dhcpcd.unpriv.sock
drwx------    9 root     root           200 Feb 15 05:03 docker
-rw-r--r--    1 root     root            17 Feb 15 03:46 docker-done
-rw-r--r--    1 root     root             4 Feb 15 03:46 docker.pid
srw-rw----    1 root     docker           0 Feb 15 03:46 docker.sock
-rw-r--r--    1 root     root             1 Feb 15 03:46 rsyslogd.pid
drwx------   10 root     root           200 Feb 15 05:00 runc
drw-r--r--    2 root     root            40 Feb 15 03:46 sshd
-rw-r--r--    1 root     root             5 Feb 15 03:46 sshd.pid
drwx------    5 root     root           120 Feb 15 03:46 system-docker
-rw-r--r--    1 root     root             1 Feb 15 03:45 system-docker.pid
srw-rw----    1 root     root             0 Feb 15 03:45 system-docker.sock
drwxr-xr-x    6 root     root           140 Feb 15 05:15 udev
-rw-r--r--    1 root     root        464640 Feb 15 05:55 utmp
-rw-------    1 root     root             0 Feb 15 03:45 xtables.lock

Using centos console to startup telegraf:

[root@rancheros1 docker]# ros -v
version v1.4.0 from os image rancher/os:v1.4.0

[root@rancheros1 docker]# ros console list
disabled alpine
current  centos
disabled debian
disabled default
disabled fedora
disabled ubuntu

[root@rancheros1 docker]# ls -la /var/run/
total 188
drwxrwxrwt 10 root root      500 Feb 15 05:26 .
drwxr-xr-x  1 root root     4096 Feb 15 05:25 ..
srw-rw-rw-  1 root root        0 Feb 15 05:18 acpid.socket
-rw-------  1 root root        0 Feb 15 05:18 agetty.reload
drwxr-xr-x  2 root root       80 Feb 15 05:17 blkid
-rw-r--r--  1 root root        6 Feb 15 05:25 console-done
drwxr-xr-x  3 root root       60 Feb 15 05:48 dhcpcd
-rw-r--r--  1 root root        5 Feb 15 05:18 dhcpcd.pid
srw-rw----  1 root root        0 Feb 15 05:18 dhcpcd.sock
srw-rw-rw-  1 root root        0 Feb 15 05:18 dhcpcd.unpriv.sock
drwx------  9 root root      200 Feb 15 05:26 docker
-rw-r--r--  1 root root       17 Feb 15 05:26 docker-done
-rw-r--r--  1 root root        4 Feb 15 05:26 docker.pid
srw-rw----  1 root docker      0 Feb 15 05:26 docker.sock
drwxr-xr-x  2 root root       60 Feb 15 05:25 mount
-rw-r--r--  1 root root        1 Feb 15 05:18 rsyslogd.pid
drwx------ 10 root root      200 Feb 15 05:26 runc
drw-r--r--  2 root root       40 Feb 15 05:18 sshd
-rw-r--r--  1 root root        5 Feb 15 05:25 sshd.pid
drwx------  5 root root      120 Feb 15 05:18 system-docker
-rw-r--r--  1 root root        1 Feb 15 05:17 system-docker.pid
srw-rw----  1 root root        0 Feb 15 05:17 system-docker.sock
drwxr-xr-x  6 root root      140 Feb 15 05:27 udev
-rw-r--r--  1 root root   156672 Feb 15 05:56 utmp
-rw-------  1 root root        0 Feb 15 05:17 xtables.lock

@jbrockopp
Copy link
Author

jbrockopp commented Feb 15, 2019

@Jason-ZW first off, let me say thank you for your quick response and looking into this 👍

Thinking further on this, maybe it has to do with how I'm activating the Centos console?

For context:

We are using Packer to build our image that I'm seeing this behavior on and to ensure we have the Centos console enabled as a part of the image we run the following in our Packer template:

  "provisioners": [
    {
      "type": "shell",
      "expect_disconnect": true,
      "inline": [
        "sudo ros console switch --force centos"
      ]
    }
  ]
}

Is it fair if I dig into this more over the weekend and report back with the results I find?

@Jason-ZW
Copy link

Jason-ZW commented Feb 16, 2019

@jbrockopp You may need autologin with the console or set the kernel parameters rancher.autologin=tty1 rancher.autologin=ttyS0.
If you don't login the system and SSH only, you probably don't have this file.

utmp maintains a full accounting of the current status of the system, system boot time (used by uptime), recording user logins at which terminals, logouts, system events etc.

@jbrockopp
Copy link
Author

jbrockopp commented Feb 18, 2019

@Jason-ZW after digging into this more, even if I switched to the centos console using cloud-config following the docs, it still showed the /var/run/utmp as a directory instead of a file.

I then tried to use the autologin feature, as suggested above, and was unsuccessful.

Attempt 1:

#cloud-config
rancher:
  autologin:
    - tty1
    - ttyS0

  console: centos

Attempt 2:

#cloud-config
rancher:
  autologin: tty1
  autologin: ttyS0

  console: centos

I then decided to see if I could set the kernel parameters, as suggested above, and was still unsuccessful.

To update the kernel, I used the in-place editing method.

When I ran sudo ros config syslinux it pulled up a file with the following contents:

APPEND printk.devkmsg=on rancher.state.dev=LABEL=RANCHER_STATE rancher.state.wait panic=10 console=tty0 console=tty1 console=ttyS0,115200n8 printk.devkmsg=on rancher.autologin=ttyS0

I added the suggested changes rancher.autologin=tty1 because it already had rancher.autologin=ttyS0 in the file.

APPEND printk.devkmsg=on rancher.state.dev=LABEL=RANCHER_STATE rancher.state.wait panic=10 console=tty0 console=tty1 console=ttyS0,115200n8 printk.devkmsg=on rancher.autologin=tty1 rancher.autologin=ttyS0

I then ran a sudo reboot -f to ensure the changes took effect. However, I did check the status of /var/run/utmp before and after the reboot and both times it still showed as a directory instead of a file.

I'm wondering how exactly I go about enabling it via the cloud-config? Or did I update it via the kernel parameters incorrectly?

@Jason-ZW
Copy link

Jason-ZW commented Feb 19, 2019

@jbrockopp
Did you delete /var/run/utmp before rebooting?

I then ran a sudo reboot -f to ensure the changes took effect.

@jbrockopp
Copy link
Author

jbrockopp commented Feb 19, 2019

@Jason-ZW I had not previously.

I just tried this morning and it still doesn't appear to work

When I run sudo ros config syslinux this is what I see:

APPEND printk.devkmsg=on rancher.state.dev=LABEL=RANCHER_STATE rancher.state.wait panic=10 console=tty0 console=tty1 console=ttyS0,115200n8 printk.devkmsg=on rancher.autologin=tty1 rancher.autologin=ttyS0

System details:

[rancher@server ~]$ sudo ros -v
version v1.4.0 from os image rancher/os:v1.4.0

[rancher@server ~]$ sudo ros console list
disabled alpine
current  centos
disabled debian
disabled default
disabled fedora
disabled ubuntu

[rancher@server ~]$ ls -al /var/run/utmp
total 0
drwxr-xr-x  2 root root  40 Feb 19 13:37 .
drwxrwxrwt 12 root root 500 Feb 19 13:37 ..

I then followed your instructions to remove /var/run/utmp

[rancher@server ~]$ sudo rm -rf /var/run/utmp

[rancher@server ~]$ ls -al /var/run/
total 32
drwxrwxrwt 11 root root    480 Feb 19 13:43 .
drwxr-xr-x  1 root root   4096 Feb 18 15:32 ..
srw-rw-rw-  1 root root      0 Feb 19 13:37 acpid.socket
drwxr-xr-x  2 root root     80 Feb 19 13:37 blkid
-rw-r--r--  1 root root      6 Feb 19 13:37 console-done
drwxr-xr-x  3 root root     60 Feb 19 13:37 dhcpcd
-rw-r--r--  1 root root      5 Feb 19 13:37 dhcpcd.pid
srw-rw----  1 root root      0 Feb 19 13:37 dhcpcd.sock
srw-rw-rw-  1 root root      0 Feb 19 13:37 dhcpcd.unpriv.sock
drwx------  9 root root    200 Feb 19 13:37 docker
-rw-r--r--  1 root root     17 Feb 19 13:37 docker-done
-rw-r--r--  1 root root      4 Feb 19 13:37 docker.pid
srw-rw----  1 root docker    0 Feb 19 13:37 docker.sock
drwxr-xr-x  2 root root     40 Feb 19 13:37 lock
drwxr-xr-x  2 root root     40 Feb 19 13:37 mount
-rw-r--r--  1 root root      1 Feb 19 13:37 rsyslogd.pid
drwx------ 10 root root    200 Feb 19 13:41 runc
drw-r--r--  2 root root     40 Feb 19 13:37 sshd
-rw-r--r--  1 root root      5 Feb 19 13:37 sshd.pid
drwx------  5 root root    120 Feb 19 13:37 system-docker
-rw-r--r--  1 root root      1 Feb 19 13:37 system-docker.pid
srw-rw----  1 root root      0 Feb 19 13:37 system-docker.sock
drwxr-xr-x  6 root root    140 Feb 19 13:41 udev
-rw-------  1 root root      0 Feb 19 13:37 xtables.lock

I then rebooted with sudo reboot -f and then upon logging back into the server I get:

[rancher@server ~]$ ls -al /var/run/utmp
total 0
drwxr-xr-x  2 root root  40 Feb 19 13:45 .
drwxrwxrwt 12 root root 500 Feb 19 13:45 ..

@Jason-ZW
Copy link

Jason-ZW commented Feb 19, 2019

@jbrockopp
Could you please remove telegraf service and /var/run/utmp from rancheros, then reboot and check the /var/run/utmp? Maybe telegraf service is startup faster than autologin.

@jbrockopp
Copy link
Author

jbrockopp commented Feb 19, 2019

@Jason-ZW I did as you asked.

When I remove the telegraf service this is what I saw:

[rancher@server ~]$ ls -al /var/run/
total 32
drwxrwxrwt 10 root root    460 Feb 19 16:47 .
drwxr-xr-x  1 root root   4096 Feb 19 16:47 ..
srw-rw-rw-  1 root root      0 Feb 19 16:47 acpid.socket
drwxr-xr-x  2 root root     80 Feb 19 16:47 blkid
-rw-r--r--  1 root root      6 Feb 19 16:47 console-done
drwxr-xr-x  3 root root     60 Feb 19 16:47 dhcpcd
-rw-r--r--  1 root root      5 Feb 19 16:47 dhcpcd.pid
srw-rw----  1 root root      0 Feb 19 16:47 dhcpcd.sock
srw-rw-rw-  1 root root      0 Feb 19 16:47 dhcpcd.unpriv.sock
drwx------  8 root root    180 Feb 19 16:47 docker
-rw-r--r--  1 root root     17 Feb 19 16:47 docker-done
-rw-r--r--  1 root root      4 Feb 19 16:47 docker.pid
srw-rw----  1 root docker    0 Feb 19 16:47 docker.sock
drwxr-xr-x  2 root root     60 Feb 19 16:47 mount
-rw-r--r--  1 root root      1 Feb 19 16:47 rsyslogd.pid
drwx------ 10 root root    200 Feb 19 17:00 runc
drw-r--r--  2 root root     40 Feb 19 16:47 sshd
-rw-r--r--  1 root root      5 Feb 19 16:47 sshd.pid
drwx------  5 root root    120 Feb 19 16:47 system-docker
-rw-r--r--  1 root root      1 Feb 19 16:47 system-docker.pid
srw-rw----  1 root root      0 Feb 19 16:47 system-docker.sock
drwxr-xr-x  6 root root    140 Feb 19 16:47 udev
-rw-------  1 root root      0 Feb 19 16:47 xtables.lock

You'll note that a /var/run/utmp does not even exist.

I then edited the kernel parameters with sudo ros config syslinux to look like:

APPEND printk.devkmsg=on rancher.state.dev=LABEL=RANCHER_STATE rancher.state.wait panic=10 console=tty0 console=tty1 console=ttyS0,115200n8 printk.devkmsg=on rancher.autologin=tty1 rancher.autologin=ttyS0

I then rebooted with sudo reboot -f and then upon logging back into the server I get:

[rancher@server ~]$ ls -al /var/run/
total 32
drwxrwxrwt 10 root root    460 Feb 19 17:38 .
drwxr-xr-x  1 root root   4096 Feb 19 16:47 ..
srw-rw-rw-  1 root root      0 Feb 19 17:38 acpid.socket
drwxr-xr-x  2 root root     80 Feb 19 17:37 blkid
-rw-r--r--  1 root root      6 Feb 19 17:38 console-done
drwxr-xr-x  3 root root     60 Feb 19 17:38 dhcpcd
-rw-r--r--  1 root root      5 Feb 19 17:38 dhcpcd.pid
srw-rw----  1 root root      0 Feb 19 17:38 dhcpcd.sock
srw-rw-rw-  1 root root      0 Feb 19 17:38 dhcpcd.unpriv.sock
drwx------  8 root root    180 Feb 19 17:38 docker
-rw-r--r--  1 root root     17 Feb 19 17:38 docker-done
-rw-r--r--  1 root root      4 Feb 19 17:38 docker.pid
srw-rw----  1 root docker    0 Feb 19 17:38 docker.sock
drwxr-xr-x  2 root root     60 Feb 19 17:38 mount
-rw-r--r--  1 root root      1 Feb 19 17:38 rsyslogd.pid
drwx------ 10 root root    200 Feb 19 17:38 runc
drw-r--r--  2 root root     40 Feb 19 17:38 sshd
-rw-r--r--  1 root root      5 Feb 19 17:38 sshd.pid
drwx------  5 root root    120 Feb 19 17:37 system-docker
-rw-r--r--  1 root root      1 Feb 19 17:37 system-docker.pid
srw-rw----  1 root root      0 Feb 19 17:37 system-docker.sock
drwxr-xr-x  6 root root    140 Feb 19 17:38 udev
-rw-------  1 root root      0 Feb 19 17:37 xtables.lock

So again, no /var/run/utmp file exists...

Rather then tuning kernel parameters, is there way I can use autologin via cloud-config? Maybe a better question to ask, is what was wrong with my previous submissions of cloud-config?

@Jason-ZW
Copy link

@jbrockopp
Oh...I reproduce this issue when I setting centos in my cloud-config file. Thanks for your feedback.

@jbrockopp
Copy link
Author

@Jason-ZW glad we're able to reproduce the issue now.

Please let me know if there is anything else you need from me 👍

@jbrockopp
Copy link
Author

@Jason-ZW is there an update on this?

We run RancherOS in production so it's important to us that we have all the metrics we need.

@niusmallnan niusmallnan added this to the v1.6.0 milestone Mar 6, 2019
@niusmallnan
Copy link
Contributor

@jbrockopp The agetty can update the utmp file, but agetty in default console is different from other consoles, it's built by busybox.
The default console agetty can create the utmp file and update it when you loggin to ros.
For other consoles, agetty is very different, it cannot create the utmp file. But after you touch the utmp file, agetty can update it.

So you can try this workaround: https://rancher.com/docs/os/v1.x/en/installation/configuration/running-commands/

# cloud-config
runcmd:
- [ touch, /var/run/utmp ]

I can get the content of utmp file after trying this workaround:

rancher@rancher:~$ who
rancher  pts/0        2019-03-06 04:00 (172.22.100.1)

rancher@rancher:~$ cat /var/run/utmp
tty22LOGIN�E�\!tty66LOGIN!�E�\tty55LOGIN�E�\tty33LOGIN�E�\tty44LOGIN�E�\tty11LOGIN�E�\�pts/0ts/0rancher172.22.100.1�E�\���d

@rootwuj
Copy link

rootwuj commented May 17, 2019

Tested with rancher/os:v1.5.2-rc1 from May 17
This version of /var/run/utmp is a file and I verified all consoles.
Verified fixed

@rootwuj rootwuj closed this as completed May 17, 2019
@jbrockopp
Copy link
Author

@Jason-ZW @niusmallnan @rootwuj

After deploying RancherOS v1.5.2, I was able to verify that the utmp file is created and our system load averages are being reported again.

Thanks for all the work on this!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants