Skip to content
This repository has been archived by the owner on Oct 30, 2024. It is now read-only.

Fix volume attachments while upgrade (v12 v13) #999

Merged
merged 3 commits into from
Jun 11, 2018
Merged

Fix volume attachments while upgrade (v12 v13) #999

merged 3 commits into from
Jun 11, 2018

Conversation

r7vme
Copy link
Contributor

@r7vme r7vme commented Jun 7, 2018

Fixes: https://github.com/giantswarm/giantswarm/issues/3307

  • Set proper dependencies (vm depends on volume and not opposite)
  • Set docker disk size to 50GB for master (prevent unnecessary resizing)

Updated v12 and v13 versions as no clusters on v12. Or do i need to create v12patch1?

QA

  • cluster from v12 successfully created with m5.large and master node survives reboot
  • cluster successfully upgraded from v11 to v12 with m5.large
  • cluster successfully upgraded from v11 to v12 with m3.large

@r7vme r7vme deployed to ginger June 7, 2018 12:02 Active
@r7vme r7vme deployed to ginger June 7, 2018 13:10 Active
@@ -4,6 +4,9 @@ const Instance = `{{define "instance"}}
{{ .Instance.Master.Instance.ResourceName }}:
Type: "AWS::EC2::Instance"
Description: Master instance
DependsOn:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem was that VM did not depend on volumes to it was immediately created with etcd as first disk and in that time docker disk was resized (takes 3-5 minutes) and then docker disk attached as second disk and mounted to /var/lib/etcd.

This dependency makes sure that VM will wait to both volumes to be ready before get started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch, thanks!

Properties:
Encrypted: true
Size: 100
Size: 50
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm rolling master docker disk size to 50GB, because we don't really need 100GB on master node. 100GB only needed for workers.

This prevents unnecessary delay (3-5 minutes caused by resizing) while upgrade.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes makes sense to leave the volume at 50GB for masters. Great that it avoids the delay.

@r7vme r7vme changed the title Fix depends on Fix volume attachments while upgrade (v12 v13) Jun 7, 2018
@r7vme r7vme requested review from a team June 7, 2018 14:10
@r7vme r7vme deployed to ginger June 7, 2018 14:17 Active
Copy link
Contributor

@rossf7 rossf7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for taking care here! <3

In terms of versioning there are other changes being worked on in v13 but I'd rather not introduce a patch unless we have to.

Also I've tested K8s 1.10.4 and it fixes the configmap problem. So upgrading from 1.10.2 is an option. WDYT?

@@ -4,6 +4,9 @@ const Instance = `{{define "instance"}}
{{ .Instance.Master.Instance.ResourceName }}:
Type: "AWS::EC2::Instance"
Description: Master instance
DependsOn:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch, thanks!

Properties:
Encrypted: true
Size: 100
Size: 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes makes sense to leave the volume at 50GB for masters. Great that it avoids the delay.

VolumeType: gp2
AvailabilityZone: !GetAtt {{ .Instance.Master.Instance.ResourceName }}.AvailabilityZone
AvailabilityZone: {{ .Instance.Master.AZ }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the master resource now depends on the volumes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise it has curcular error :(

Copy link
Contributor

@xh3b4sd xh3b4sd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. Thanks.

@r7vme
Copy link
Contributor Author

r7vme commented Jun 7, 2018

thanks for review unfortunately this is only a part of the problem. The thing is that with m5 we in general affected by "disk ordering problem". Disks randomly can be nvme1 / nvme2 , so we need to extract disk label (that we set in CF).

This is already hacked in community
kubernetes-retired/kube-aws#1313
coreos/bugs#2399 (comment)

We also need to do some kind of hack. I'll work on that tomorrow.

@r7vme
Copy link
Contributor Author

r7vme commented Jun 7, 2018

Also I've tested K8s 1.10.4 and it fixes the configmap problem. So upgrading from 1.10.2 is an option. WDYT?

Yes 1.10.4 makes total sense.

Roman Sokolkov added 3 commits June 8, 2018 14:36
- Set proper dependencies (vm depends on volume and not opposite)
- Set docker disk size to 50GB for master (prevent unnecessary rezizing)
@r7vme r7vme deployed to ginger June 8, 2018 13:09 Active
@r7vme
Copy link
Contributor Author

r7vme commented Jun 8, 2018

  • Added 1.10.4
  • Added NVMe udev rules (allows to extract disk labels and use /dev/xvdh and /dev/xvdc)

Tested in ginger. Test list added in description.

@r7vme
Copy link
Contributor Author

r7vme commented Jun 8, 2018

I'll be merging this on Monday.

@@ -8,7 +8,7 @@ ConditionPathExists=!/var/lib/docker

[Service]
Type=oneshot
ExecStart=/bin/bash -c "([ -b "/dev/xvdc" ] && /usr/sbin/mkfs.xfs -f /dev/xvdc -L docker) || ([ -b "/dev/nvme1n1" ] && /usr/sbin/mkfs.xfs -f /dev/nvme1n1 -L docker)"
ExecStart=/bin/bash -c "[ -e "/dev/xvdc" ] && /usr/sbin/mkfs.xfs -f /dev/xvdc -L docker"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-e file exists (any type), because /dev/xvdc is symlink in case NVMe and block device in case m3

@@ -0,0 +1,26 @@
package cloudconfig

const NVMEUdevRule = `KERNEL=="nvme[0-9]*n[0-9]*", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon Elastic Block Store", PROGRAM="/opt/ebs-nvme-mapping /dev/%k", SYMLINK+="%c"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

udev rule that calls script on any events with nvme that will create/delete sylinks with name that was specified in EBS e.g. /dev/xvdh

fi
`

const NVMEUdevTriggerUnit = `[Unit]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unit only necessary on the first boot, because udev rule was just added and we need to retrigger udev.

@r7vme r7vme merged commit f103a27 into master Jun 11, 2018
@r7vme r7vme deleted the fixattachemtn branch June 11, 2018 07:26
@oogali
Copy link

oogali commented Jun 11, 2018

@r7vme @rossf7 I'm excited and pleased that you've discovered my solution[1] for dealing with NVMe devices in AWS by way of the CoreOS and kube-aws projects, but as a courtesy, can you include a copy of the license and copyright notice[2] in your derivative work?

1: https://github.com/oogali/ebs-automatic-nvme-mapping
2: https://github.com/oogali/ebs-automatic-nvme-mapping/blob/master/LICENSE

@r7vme
Copy link
Contributor Author

r7vme commented Jun 12, 2018

Hi @oogali , adding licence here

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants