Fix volume attachments while upgrade (v12 v13) #999

r7vme · 2018-06-07T11:46:59Z

Fixes: https://github.com/giantswarm/giantswarm/issues/3307

Set proper dependencies (vm depends on volume and not opposite)
Set docker disk size to 50GB for master (prevent unnecessary resizing)

Updated v12 and v13 versions as no clusters on v12. Or do i need to create v12patch1?

QA

cluster from v12 successfully created with m5.large and master node survives reboot
cluster successfully upgraded from v11 to v12 with m5.large
cluster successfully upgraded from v11 to v12 with m3.large

r7vme · 2018-06-07T14:07:06Z

service/controller/v12/templates/cloudformation/guest/instance.go

@@ -4,6 +4,9 @@ const Instance = `{{define "instance"}}
  {{ .Instance.Master.Instance.ResourceName }}:
    Type: "AWS::EC2::Instance"
    Description: Master instance
+    DependsOn:


Problem was that VM did not depend on volumes to it was immediately created with etcd as first disk and in that time docker disk was resized (takes 3-5 minutes) and then docker disk attached as second disk and mounted to /var/lib/etcd.

This dependency makes sure that VM will wait to both volumes to be ready before get started.

Very good catch, thanks!

r7vme · 2018-06-07T14:08:17Z

service/controller/v12/templates/cloudformation/guest/instance.go

    Properties:
      Encrypted: true
-      Size: 100
+      Size: 50


I'm rolling master docker disk size to 50GB, because we don't really need 100GB on master node. 100GB only needed for workers.

This prevents unnecessary delay (3-5 minutes caused by resizing) while upgrade.

Yes makes sense to leave the volume at 50GB for masters. Great that it avoids the delay.

rossf7

LGTM. Thanks for taking care here! <3

In terms of versioning there are other changes being worked on in v13 but I'd rather not introduce a patch unless we have to.

Also I've tested K8s 1.10.4 and it fixes the configmap problem. So upgrading from 1.10.2 is an option. WDYT?

rossf7 · 2018-06-07T14:20:11Z

service/controller/v12/templates/cloudformation/guest/instance.go

@@ -4,6 +4,9 @@ const Instance = `{{define "instance"}}
  {{ .Instance.Master.Instance.ResourceName }}:
    Type: "AWS::EC2::Instance"
    Description: Master instance
+    DependsOn:


Very good catch, thanks!

rossf7 · 2018-06-07T14:21:08Z

service/controller/v12/templates/cloudformation/guest/instance.go

    Properties:
      Encrypted: true
-      Size: 100
+      Size: 50


Yes makes sense to leave the volume at 50GB for masters. Great that it avoids the delay.

rossf7 · 2018-06-07T14:23:04Z

service/controller/v13/templates/cloudformation/guest/instance.go

      VolumeType: gp2
-      AvailabilityZone: !GetAtt {{ .Instance.Master.Instance.ResourceName }}.AvailabilityZone
+      AvailabilityZone: {{ .Instance.Master.AZ }}


This is because the master resource now depends on the volumes?

Yes, otherwise it has curcular error :(

xh3b4sd

That makes sense to me. Thanks.

r7vme · 2018-06-07T18:30:27Z

thanks for review unfortunately this is only a part of the problem. The thing is that with m5 we in general affected by "disk ordering problem". Disks randomly can be nvme1 / nvme2 , so we need to extract disk label (that we set in CF).

This is already hacked in community
kubernetes-retired/kube-aws#1313
coreos/bugs#2399 (comment)

We also need to do some kind of hack. I'll work on that tomorrow.

r7vme · 2018-06-07T18:30:53Z

Also I've tested K8s 1.10.4 and it fixes the configmap problem. So upgrading from 1.10.2 is an option. WDYT?

Yes 1.10.4 makes total sense.

- Set proper dependencies (vm depends on volume and not opposite) - Set docker disk size to 50GB for master (prevent unnecessary rezizing)

r7vme · 2018-06-08T14:46:36Z

Added 1.10.4
Added NVMe udev rules (allows to extract disk labels and use /dev/xvdh and /dev/xvdc)

Tested in ginger. Test list added in description.

r7vme · 2018-06-08T14:50:28Z

I'll be merging this on Monday.

r7vme · 2018-06-08T14:53:26Z

service/controller/v12/templates/cloudconfig/master_format_var_lib_docker_service.go

@@ -8,7 +8,7 @@ ConditionPathExists=!/var/lib/docker

 [Service]
 Type=oneshot
-ExecStart=/bin/bash -c "([ -b "/dev/xvdc" ] && /usr/sbin/mkfs.xfs -f /dev/xvdc -L docker) || ([ -b "/dev/nvme1n1" ] && /usr/sbin/mkfs.xfs -f /dev/nvme1n1 -L docker)"
+ExecStart=/bin/bash -c "[ -e "/dev/xvdc" ] && /usr/sbin/mkfs.xfs -f /dev/xvdc -L docker"


-e file exists (any type), because /dev/xvdc is symlink in case NVMe and block device in case m3

r7vme · 2018-06-08T14:54:52Z

service/controller/v12/templates/cloudconfig/nvme_udev_hack.go

@@ -0,0 +1,26 @@
+package cloudconfig
+
+const NVMEUdevRule = `KERNEL=="nvme[0-9]*n[0-9]*", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon Elastic Block Store", PROGRAM="/opt/ebs-nvme-mapping /dev/%k", SYMLINK+="%c"


udev rule that calls script on any events with nvme that will create/delete sylinks with name that was specified in EBS e.g. /dev/xvdh

r7vme · 2018-06-08T14:55:29Z

service/controller/v12/templates/cloudconfig/nvme_udev_hack.go

+fi
+`
+
+const NVMEUdevTriggerUnit = `[Unit]


This unit only necessary on the first boot, because udev rule was just added and we need to retrigger udev.

oogali · 2018-06-11T22:04:35Z

@r7vme @rossf7 I'm excited and pleased that you've discovered my solution[1] for dealing with NVMe devices in AWS by way of the CoreOS and kube-aws projects, but as a courtesy, can you include a copy of the license and copyright notice[2] in your derivative work?

1: https://github.com/oogali/ebs-automatic-nvme-mapping
2: https://github.com/oogali/ebs-automatic-nvme-mapping/blob/master/LICENSE

r7vme · 2018-06-12T10:56:40Z

Hi @oogali , adding licence here

r7vme deployed to ginger June 7, 2018 12:02 Active

r7vme force-pushed the fixattachemtn branch from b0477de to 986bd13 Compare June 7, 2018 12:34

r7vme deployed to ginger June 7, 2018 13:10 Active

r7vme force-pushed the fixattachemtn branch from 986bd13 to 8c1a896 Compare June 7, 2018 14:03

r7vme commented Jun 7, 2018

View reviewed changes

r7vme changed the title ~~Fix depends on~~ Fix volume attachments while upgrade (v12 v13) Jun 7, 2018

r7vme requested review from a team June 7, 2018 14:10

r7vme deployed to ginger June 7, 2018 14:17 Active

rossf7 approved these changes Jun 7, 2018

View reviewed changes

xh3b4sd approved these changes Jun 7, 2018

View reviewed changes

Roman Sokolkov added 3 commits June 8, 2018 14:36

Fix etcd volume attachment

c639d10

- Set proper dependencies (vm depends on volume and not opposite) - Set docker disk size to 50GB for master (prevent unnecessary rezizing)

Add nvme udev hack

81d4b75

Update vendor

2752d0e

r7vme force-pushed the fixattachemtn branch from cea51c4 to 2752d0e Compare June 8, 2018 12:47

r7vme deployed to ginger June 8, 2018 13:09 Active

r7vme commented Jun 8, 2018

View reviewed changes

r7vme merged commit f103a27 into master Jun 11, 2018

r7vme deleted the fixattachemtn branch June 11, 2018 07:26

rossf7 mentioned this pull request Jun 11, 2018

v13: Use k8scloudconfig v_3_3_3 with Kubernetes 1.10.4 #997

Closed

r7vme mentioned this pull request Jun 12, 2018

Add MIT licence for ebs-automatic-nvme-mapping #1006

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix volume attachments while upgrade (v12 v13) #999

Fix volume attachments while upgrade (v12 v13) #999

r7vme commented Jun 7, 2018 •

edited

Loading

r7vme Jun 7, 2018

rossf7 Jun 7, 2018

r7vme Jun 7, 2018

rossf7 Jun 7, 2018

rossf7 left a comment

rossf7 Jun 7, 2018

rossf7 Jun 7, 2018

rossf7 Jun 7, 2018

r7vme Jun 7, 2018

xh3b4sd left a comment

r7vme commented Jun 7, 2018

r7vme commented Jun 7, 2018

r7vme commented Jun 8, 2018

r7vme commented Jun 8, 2018

r7vme Jun 8, 2018

r7vme Jun 8, 2018

r7vme Jun 8, 2018

oogali commented Jun 11, 2018 •

edited

Loading

r7vme commented Jun 12, 2018

		@@ -0,0 +1,26 @@
		package cloudconfig

		const NVMEUdevRule = `KERNEL=="nvme[0-9]n[0-9]", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon Elastic Block Store", PROGRAM="/opt/ebs-nvme-mapping /dev/%k", SYMLINK+="%c"

Fix volume attachments while upgrade (v12 v13) #999

Fix volume attachments while upgrade (v12 v13) #999

Conversation

r7vme commented Jun 7, 2018 • edited Loading

QA

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossf7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xh3b4sd left a comment

Choose a reason for hiding this comment

r7vme commented Jun 7, 2018

r7vme commented Jun 7, 2018

r7vme commented Jun 8, 2018

r7vme commented Jun 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oogali commented Jun 11, 2018 • edited Loading

r7vme commented Jun 12, 2018

r7vme commented Jun 7, 2018 •

edited

Loading

oogali commented Jun 11, 2018 •

edited

Loading