Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beta 3185.1.0: ignition fails to create partition on second disk (vmware) #729

Open
defo89 opened this issue May 6, 2022 · 14 comments
Open
Labels
kind/bug Something isn't working

Comments

@defo89
Copy link

defo89 commented May 6, 2022

Description

With Beta 3185.1.0 and ignition v3 we observe issues when vSphere VM has more than one disk.

Impact

Cannot deploy VM.

Environment and steps to reproduce

  1. Set-up: Flatcar VM deployed in vSphere 7 using terraform-provider-vsphere v2.0.2
  2. Task: Deploy Flatcar Beta 3185.1.0 OVA using Ignition v3 spec file (as vapp)
  3. Error: Ignition fails with: create partitions failed: Failed to pretend to create partitions: exit status 4. Stderr: Could not create partition 1 from 4194304 to 20975714303. Sometimes ignition fails without an error message. In both cases entering Emergency shell is not possible (reboot loop).

ignition-v3-disk-error

Expected behavior

VM is deployed as it is the case with Flatcar Stable 3139.2.0 OVA with Ignition v2 spec file

Additional information

To narrow it down to Beta release, same ignition json is used (just few lines edited that differ between v2 and v3 spec file).
Attaching both files to the issue.

VM config to reproduce:

provider "vsphere" {
  user                 = "user"
  password             = var.password
  vsphere_server       = "vc-server-url"
  persist_session      = true
  client_debug         = true
}

data "vsphere_datacenter" "dc" {
  name = "DC"
}

data "vsphere_datastore_cluster" "datastore" {
  name          = "datastore"
  datacenter_id = "${data.vsphere_datacenter.dc.id}"
}

data "vsphere_compute_cluster" "cluster" {
  name          = "cluster"
  datacenter_id = "${data.vsphere_datacenter.dc.id}"
}

data "vsphere_virtual_machine" "template" {
  name          = "flatcar_production_vmware_beta"
  datacenter_id = "${data.vsphere_datacenter.dc.id}"
}

data "vsphere_network" "network" {
  name          = "network"
  datacenter_id = "${data.vsphere_datacenter.dc.id}"
}

data "local_file" "ignitions" {
  filename = "ignition.json"
}

resource "vsphere_virtual_machine" "vm" {
  name             = "beta-ignition-v3"
  resource_pool_id = "${data.vsphere_compute_cluster.cluster.resource_pool_id}"
  datastore_cluster_id = "${data.vsphere_datastore_cluster.datastore.id}"

  num_cpus = 2
  memory   = 1024
  guest_id = "${data.vsphere_virtual_machine.template.guest_id}"
  scsi_type = "${data.vsphere_virtual_machine.template.scsi_type}"

  network_interface {
    network_id   = "${data.vsphere_network.network.id}"
    adapter_type = "${data.vsphere_virtual_machine.template.network_interface_types[0]}"
  }

  disk {
    label            = "disk0"
    size             = "64"
    unit_number      = "0"
    eagerly_scrub    = false
    thin_provisioned = true
  }

  disk {
    label            = "disk1"
    size             = "64"
    unit_number      = "1"
    eagerly_scrub    = false
    thin_provisioned = true
  }

  clone {
    template_uuid = "${data.vsphere_virtual_machine.template.id}"
  }

vapp {
    properties = {
      "guestinfo.ignition.config.data"          = base64gzip(data.local_file.ignitions.content)
      "guestinfo.ignition.config.data.encoding" = "gz+base64"
    }
  }
}
@defo89 defo89 added the kind/bug Something isn't working label May 6, 2022
@defo89
Copy link
Author

defo89 commented May 6, 2022

Ignition file for Flatcar Beta 3185.1.0 (failing) ignition-v3-example.json.txt

Ignition file for Flatcar Stable 3139.2.0 (working) ignition-v2-example.json.txt

@defo89
Copy link
Author

defo89 commented May 6, 2022

Hi @pothos, I have stumbled across your PR coreos/ignition#1319 which is not merged yet and is planned for coreos/ignition release 2.14.0.
I was wondering if this could be related. Although I am not sure if Flatcar Beta 3185.1.0 (ignition 2.13.0) is already using the updated code.

@jepio
Copy link
Member

jepio commented May 6, 2022

What's the value of data.vsphere_virtual_machine.template.scsi_type? Can you paste the yaml you use to create the ignition json (both for v2 and v3)?

@defo89
Copy link
Author

defo89 commented May 6, 2022

Ignition v3 file (sorry have to add .txt to upload) ignition.tf.txt
Using this provider to create v3 spec file https://github.com/community-terraform-providers/terraform-provider-ignition

To avoid messing with v2, I just edit v3 file to make it to v2.

And for scsi_type:

output "template" {
  value = data.vsphere_virtual_machine.template.scsi_type
}

Outputs:
template = pvscsi

I missed to provide output of device paths when VM comes up (with disk attached but without ignition_disk part).

# ls -la /dev/disk/by-path
total 0
drwxr-xr-x. 2 root root 220 May  5 14:40 .
drwxr-xr-x. 9 root root 180 May  5 14:39 ..
lrwxrwxrwx. 1 root root   9 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part2 -> ../../sda2
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part3 -> ../../sda3
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part4 -> ../../sda4
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part6 -> ../../sda6
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part7 -> ../../sda7
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part9 -> ../../sda9
lrwxrwxrwx. 1 root root   9 May  5 14:39 pci-0000:03:00.0-scsi-0:0:1:0 -> ../../sdb

Hope this helps.

@pothos
Copy link
Member

pothos commented May 6, 2022

Hi @pothos, I have stumbled across your PR coreos/ignition#1319 which is not merged yet and is planned for coreos/ignition release 2.14.0. I was wondering if this could be related. Although I am not sure if Flatcar Beta 3185.1.0 (ignition 2.13.0) is already using the updated code.

The fix is already part of our Flatcar release.

Can you try the same v2 config on 3185.1.0? It will be translated to v3 on the fly and I wonder it could make a difference.

@defo89
Copy link
Author

defo89 commented May 6, 2022

Hi @pothos, I have stumbled across your PR coreos/ignition#1319 which is not merged yet and is planned for coreos/ignition release 2.14.0. I was wondering if this could be related. Although I am not sure if Flatcar Beta 3185.1.0 (ignition 2.13.0) is already using the updated code.

The fix is already part of our Flatcar release.

Can you try the same v2 config on 3185.1.0? It will be translated to v3 on the fly and I wonder it could make a difference.

Thanks for confirming.
I have tried with same v2 config json on 3185.1.0 - getting the same error.

@defo89
Copy link
Author

defo89 commented Jun 14, 2022

Just confirmed that same is happening with latest beta 3227.1.0.

@jepio
Copy link
Member

jepio commented Jun 14, 2022

Hi @defo89, looked into this:
Right now the ignition conversion does not handle ignition version 2.1.0, that's why ignition-v2.json is failing on newer Flatcar's. You can make it work by manually editing it in the following way:

--- a/ignition-v2-example.json.txt
+++ b/ignition-v2-example.json.txt
@@ -2,7 +2,7 @@
     "ignition": {
         "config": {},
         "timeouts": {},
-        "version": "2.1.0"
+        "version": "2.3.0"
     },
     "passwd": {
         "users": [
@@ -15,13 +15,13 @@
     "storage": {
         "disks": [
             {
-                "device": "/dev/disk/by-path/pci-0000:00:07.0",
+                "device": "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0",
                 "partitions": [
                     {
                         "label": "etc-test",
                         "number": 1,
-                        "size": 10240000,
-                        "start": 2048,
+                        "sizeMiB": 5120000,
+                        "startMiB": 1024,
                         "typeGuid": ""
                     }
                 ]

The older "size" and "start" properties are expressed in sectors, which is mostly 512 bytes.

As to ignition-v3.json not working: are you sure your disk is 10TB in size? It is also possible that things are failing because the disks are getting reordered (/dev/sda swapped with /dev/sdb). Things might be better if you attach the disk to a separate scsi controller instead of having both disks under the same one. You're already using stable device paths so nevermind.
If the v2 json file works after runtime conversion by ignition, then v3.json should also work (it does in my testing).

@defo89
Copy link
Author

defo89 commented Jul 5, 2022

Thanks for looking at this @jepio. For now I worked this around by switching to a single vsphere disk for the affected VMs.

On the related note, is there an ETA for bringing ignition-v3 to stable release (in other words, when >=3185.0.0 will become stable)? nvm, it's now in stable

@TimoKramer
Copy link

TimoKramer commented Mar 21, 2024

Seeing this quite often when updating and replacing Flatcar with an attached durable disk:

Ignition finished successfully
Ignition 2.15.0
Stage: kargs
no configs at "/usr/lib/ignition/base.d"
no config dir at "/usr/lib/ignition/base.platform.d/azure"
kargs: kargs passed
Ignition finished successfully
Ignition 2.15.0
Stage: disks
no configs at "/usr/lib/ignition/base.d"
no config dir at "/usr/lib/ignition/base.platform.d/azure"
disks: createPartitions: op(1): [started]  waiting for devices [/dev/disk/azure/scsi1/lun1]
disks: createPartitions: op(1): [finished] waiting for devices [/dev/disk/azure/scsi1/lun1]
disks: createPartitions: created device alias for "/dev/disk/azure/scsi1/lun1": "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1" -> "/dev/sda"
disks: createPartitions: op(2): [started]  partitioning "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1"
disks: createPartitions: op(2): op(3): [started]  reading partition table of "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1"
disks: createPartitions: op(2): op(3): [finished] reading partition table of "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1"
disks: createPartitions: op(2): running sgdisk with options: [--pretend --new=0:0:+0 /run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1]
disks: createPartitions: op(2): [failed]   partitioning "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1": Failed to pretend to create partitions. Err: exit status 4. Stderr: Could not create partition 3 from 0 to 33
Error encountered; not saving changes.
disks failed
Full config:
{
  "ignition": {
    "config": {
      "replace": {
        "verification": {}
      }
    },
    "proxy": {},
    "security": {
      "tls": {}
    },
    "timeouts": {},
    "version": "3.5.0-experimental"
  },...

Flatcar version: 3815.2.0
Butane version: 0.19.0

Only deleting the disk brings me forward when this happens. It does not happen all the time though...

This is the disk setup I am using in the butane template:

variant: flatcar
version: 1.0.0

storage:
  disks:
    - device: /dev/disk/azure/scsi1/lun1
      partitions:
        - label: portal
  filesystems:
    - device: /dev/disk/by-partlabel/portal
      format: ext4
      wipe_filesystem: true
      label: portal

@jepio
Copy link
Member

jepio commented Mar 21, 2024

Isn't that a different issue, related to terraform: flatcar/flatcar-website#296 ?

@TimoKramer
Copy link

TimoKramer commented Mar 21, 2024

Isn't that a different issue, related to terraform

No, this is not related. This is a problem with an already existing disk when recreating the flatcar VM.

@pothos
Copy link
Member

pothos commented Apr 9, 2024

So there is some race involved and it doesn't always happen? The same error message was reported in coreos/bugs#2100 (comment)

Edit: answer from there says the same as Jeremi below

@jepio
Copy link
Member

jepio commented Apr 9, 2024

@TimoKramer:
your partition is missing an explicit number: 1. you're falling into this behavior:

partitions (list of objects): the list of partitions and their configuration for this particular disk. Every partition must have a unique number, or if 0 is specified, a unique label.
number (integer): the partition number, which dictates its position in the partition table (one-indexed). If zero, use the next available partition slot.

so I understand that you would expect the match to happen on the label field, but ignition tries to create a new partition on every rerun. After the first provisioning the disk has no more free space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Development

No branches or pull requests

4 participants