Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: System has 0 devices with a filesystem labeled 'boot' #1483

Closed
abhinavdahiya opened this issue May 1, 2023 · 9 comments
Closed

Error: System has 0 devices with a filesystem labeled 'boot' #1483

abhinavdahiya opened this issue May 1, 2023 · 9 comments
Labels

Comments

@abhinavdahiya
Copy link

Describe the bug

AMI: fedora-coreos-37.20230401.3.0-x86_64

AWS instance failing to boot with following error :

Displaying logs from failed units: coreos-ignition-unique-boot.service
May 01 21:15:40 systemd[1]: Starting coreos-ignition-unique-boot.service - CoreOS Ensure Unique Boot Filesystem...
May 01 21:15:40 rdcore[1511]: Error: System has 0 devices with a filesystem labeled 'boot': []
May 01 21:15:40 systemd[1]: [0;1;39m[0;1;31m[0;1;39mcoreos-ignition-unique-boot.service: Main process exited, code=exited, status=1/FAILURE[0m
May 01 21:15:40 systemd[1]: [0;1;38;5;185m[0;1;39m[0;1;38;5;185mcoreos-ignition-unique-boot.service: Failed with result 'exit-code'.[0m
May 01 21:15:40 systemd[1]: [0;1;31m[0;1;39m[0;1;31mFailed to start coreos-ignition-unique-boot.service - CoreOS Ensure Unique Boot Filesystem.[0m
May 01 21:15:40 systemd[1]: coreos-ignition-unique-boot.service: Triggering OnFailure= dependencies.

This happens on some instances and not all the time.

Reproduction steps

This is not exactly reproducible as it happens only sometimes

Expected behavior

Expect the ignition to complete and AWS instance to boot

Actual behavior

AWS instance is failing to boot

System details

AWS

did not get rpm-ostree status -b as the instance is failing to boot on AWS.

Butane or Ignition config

No response

Additional information

No response

@tomwans
Copy link

tomwans commented May 1, 2023

We have seen this on both c5.xlarge (amd64) and c6g.16xlarge (aarch64) instance types

@bgilbert
Copy link
Contributor

bgilbert commented May 1, 2023

Please obtain the full console log from the EC2 console or API and post it here. Is there anything unusual about your instance configuration? (For example, multiple disks.)

@tomwans
Copy link

tomwans commented May 2, 2023

@bgilbert Thanks - here is a sample of a full log we were able to obtain: https://gist.github.com/tomwans/5d89b652eaefe9e2772707d5f5ef8c21

We do use multiple disks in this case.

@bgilbert
Copy link
Contributor

bgilbert commented May 2, 2023

@tomwans Thanks. In that log, you're booting from /dev/nvme2n1, but your Ignition config wipes that disk and repartitions it as a single RAID component. And of course, once you do that, you don't have an operating system anymore. Note that the Linux kernel does not guarantee that a storage device will be assigned the same name on every boot. When specifying partitioning for multiple disks, you should use the stable device aliases in /dev/disk instead of the kernel's device names.

@tomwans
Copy link

tomwans commented May 2, 2023

@bgilbert thanks. Any idea why this might be intermittent, given the same disk layout and instance type in AWS? And why might we be seeing this only in this most recent version of CoreOS?

We haven't changed the Ignition disk configuration in a very long time, so not sure why this would be coming up now.

@dustymabe
Copy link
Member

dustymabe commented May 2, 2023

@bgilbert thanks. Any idea why this might be intermittent, given the same disk layout and instance type in AWS?

If it's a race condition that could explain why the behavior may not be consistent.

And why might we be seeing this only in this most recent version of CoreOS?

Maybe newer kernels trigger the race condition more often?

@abhinavdahiya
Copy link
Author

Here is the ignition config that is used for the failing instance. removed the files, units

{
    "ignition": {
        "config": {
            "replace": {
                "verification": {}
            }
        },
        "proxy": {},
        "security": {
            "tls": {}
        },
        "timeouts": {},
        "version": "3.3.0"
    },
    "kernelArguments": {
        "shouldExist": [
            "processor.max_cstate=1",
            "intel_idle.max_cstate=1",
            "systemd.unified_cgroup_hierarchy=0",
            "mitigations=off",
            "console=tty0",
            "console=ttyS0,115200n8",
            "audit=1"
        ],
        "shouldNotExist": [
            "mitigations=auto,nosmt"
        ]
    },
    "passwd": {},
    "storage": {
        "directories": [],
        "disks": [
            {
                "device": "/dev/nvme1n1",
                "partitions": [
                    {
                        "label": "raid.0.1",
                        "number": 1,
                        "sizeMiB": 0,
                        "startMiB": 0
                    }
                ],
                "wipeTable": true
            },
            {
                "device": "/dev/nvme2n1",
                "partitions": [
                    {
                        "label": "raid.0.2",
                        "number": 1,
                        "sizeMiB": 0,
                        "startMiB": 0
                    }
                ],
                "wipeTable": true
            },
            {
                "device": "/dev/nvme3n1",
                "partitions": [
                    {
                        "label": "raid.0.3",
                        "number": 1,
                        "sizeMiB": 0,
                        "startMiB": 0
                    }
                ],
                "wipeTable": true
            }
        ],
        "files": [],
        "filesystems": [
            {
                "device": "/dev/md/var-ebs-stripe",
                "format": "xfs",
                "path": "/var"
            }
        ],
        "raid": [
            {
                "devices": [
                    "/dev/disk/by-partlabel/raid.0.1",
                    "/dev/disk/by-partlabel/raid.0.2",
                    "/dev/disk/by-partlabel/raid.0.3"
                ],
                "level": "raid0",
                "name": "var-ebs-stripe"
            }
        ]
    },
    "systemd": {
        "units": []
    }
}

@bgilbert
Copy link
Contributor

bgilbert commented May 2, 2023

Yup, that's the problem. It's not safe for the config to assume that the boot disk is nvme0n1 and the data disks are nvme1n1 and up.

@bgilbert bgilbert closed this as not planned Won't fix, can't repro, duplicate, stale May 2, 2023
@jlebon
Copy link
Member

jlebon commented May 2, 2023

I think this is a dupe of #1122.

Edit: I've added a comment there that may help your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants