Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Create node group user data from given template #1645

Conversation

scalen
Copy link

@scalen scalen commented Oct 14, 2021

PR o'clock

Closes #1644

Description

Prior to this commit, user data for node groups was generated from a
prescribed template, and user data in other formats (such as the TOML
used to configure Bottlerocket instances, see link) was not supported.

This change allows a node_group to specify its own user data template
file, the template's extra arguments and the user data's mime type;
this in turn supports alternative forms of user data as required by any
given AMI.

https://github.com/bottlerocket-os/bottlerocket#using-user-data

Checklist

@scalen scalen changed the title Create node group user data from given template feat: Create node group user data from given template Oct 14, 2021
@scalen scalen force-pushed the node_modules/support-alternative-userdata-formats branch 3 times, most recently from 76f78da to 259c997 Compare October 20, 2021 08:42
@daroga0002
Copy link
Contributor

@scalen could you paster here some examples of module configuration to test?

@scalen
Copy link
Author

scalen commented Oct 20, 2021

could you paster here some examples of module configuration to test?

@daroga0002 I have copy-pasted the old bottlerocket example, and changed any parameters/templates necessary to match the node_modules configuration: 35948f9

@scalen scalen force-pushed the node_modules/support-alternative-userdata-formats branch from 00bcdea to 35948f9 Compare October 20, 2021 11:01
@stevehipwell
Copy link
Contributor

@scalen I was planning on doing this work so I'm very happy to see you've already picked it up. Could you rebase this onto the latest master branch and I'll give it a review?

@scalen scalen force-pushed the node_modules/support-alternative-userdata-formats branch from 35948f9 to e8d5df5 Compare November 3, 2021 11:58
@scalen
Copy link
Author

scalen commented Nov 3, 2021

@stevehipwell I have rebased, so this should be ready for review 🙂

@stevehipwell
Copy link
Contributor

@scalen I'm not sure there is a precedent for complex objects in the node group config so I'd suggest removing the nesting and using user_data_ as a prefix (e.g. user_data_mime_type), there are only 3 variables. This will also make it possible to only override one of the values without having to specifically set the others to their defaults, which wouldn't be as easy to use and could cause outdated defaults to be used if they were ever changed. Other than that this looks really good. How much testing have you done?

@scalen
Copy link
Author

scalen commented Nov 3, 2021

@scalen I'm not sure there is a precedent for complex objects in the node group config so I'd suggest removing the nesting and using user_data_ as a prefix (e.g. user_data_mime_type), there are only 3 variables. This will also make it possible to only override one of the values without having to specifically set the others to their defaults, which wouldn't be as easy to use and could cause outdated defaults to be used if they were ever changed.

Fair enough!

How much testing have you done?

I have two BottleRocket node groups deployed using this with different SGs and related K8s taints and labels (with workloads relying on the labels, and others not tolerating the taints, which we have verified); AWS tags for integrating with the autoscaling controller; and common configuration of things like the disk type shared by all node groups with node_groups_defaults. I haven't tested with any variations of configuration, but felt that our motivating case tested most of the pertinent features being touched in this change.

Other than that this looks really good.

🙇🏼

@stevehipwell
Copy link
Contributor

@scalen if you make the changes to the variables I'll review again and then we can see if @daroga0002 can approve running CI. Once this is merged I might take a look at adding the changes in my previous PRs and the changes here for launch template workers.

@scalen
Copy link
Author

scalen commented Nov 3, 2021

@stevehipwell Requested changes made in 7135ba9

Copy link
Contributor

@stevehipwell stevehipwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, @daroga0002 what do you think?

@jaimehrubiks
Copy link
Contributor

Does this modify existing node_groups with user data like these:

  pre_userdata = join("\n\n#----------------\n\n",[
    templatefile("${path.module}/templates/_volume_fg6001.sh", {}),
    templatefile("${path.module}/templates/_volume_fg6002.sh", {}),
  ])

If you don't change anything?

@scalen
Copy link
Author

scalen commented Nov 3, 2021

Does this modify existing node_groups with user data like these:

  pre_userdata = join("\n\n#----------------\n\n",[
    templatefile("${path.module}/templates/_volume_fg6001.sh", {}),
    templatefile("${path.module}/templates/_volume_fg6002.sh", {}),
  ])

If you don't change anything?

No, the default behaviour remains the same.

@scalen scalen force-pushed the node_modules/support-alternative-userdata-formats branch 2 times, most recently from 48bb991 to f99edd6 Compare November 3, 2021 16:37
@bryantbiggs
Copy link
Member

ah, good ol' userdata again. I think we just need to provide a bring-your-own user_data functionality. something like:

	...
	user_data = var.create_user_data ?  base64encode(local.launch_template_userdata_rendered[count.index]) : var.user_data_base64
	...

thoughts @daroga0002 ?

@stevehipwell
Copy link
Contributor

@bryantbiggs isn't this PR providing a BYO fallback implementation?

@bryantbiggs
Copy link
Member

yes, but in a way that is getting very complex and convoluted

@stevehipwell
Copy link
Contributor

@bryantbiggs it's a complex problem that's being solved. The templated solutions currently provided by the input variables have a pretty good UX and for everyone else this PR will allow them to own the full complexity themselves.

@scalen
Copy link
Author

scalen commented Nov 6, 2021

yes, but in a way that is getting very complex and convoluted

It is a little convoluted, but by providing the template and additional args separately, it means we can utilise many variables created within the module. If we required the user data to be passed in whole, we may struggle to get some of the same synchronisation that is possible/trivial with the current approach.

@scalen scalen force-pushed the node_modules/support-alternative-userdata-formats branch 2 times, most recently from 65bf629 to f510614 Compare November 17, 2021 09:20
@stevehipwell
Copy link
Contributor

@scalen do you not need to provide a variable to customise the block_device_mappings.device_name to support Bottlerocket (and other OSs)?

See #1695 & #1685.

@scalen
Copy link
Author

scalen commented Nov 30, 2021

@scalen do you not need to provide a variable to customise the block_device_mappings.device_name to support Bottlerocket (and other OSs)?

See #1695 & #1685.

We haven't seen any issues running of this PR branch, but then we haven't specified any significant configuration. I assume the suggestion is to make a more flexible version of this change, to let the user define the volume to configure:

device_name = length(split("BOTTLEROCKET", each.value["ami_type"])) > 1 ? "/dev/xvdb" : "/dev/xvda"

This still means only one volume will be configured, but it seems that will be fine for the Bottlerocket case 🤷🏼

@stevehipwell
Copy link
Contributor

@scalen that is correct; I assume that if you tried to configure the volume that it'd be applied to the wrong Bottlerocket volume (as is discussed in the other PR). A generic way to pass in the block device name would be a good solution, as would adding it in to the Bottlerocket example.

@hi-artem
Copy link

@scalen do you not need to provide a variable to customise the block_device_mappings.device_name to support Bottlerocket (and other OSs)?
See #1695 & #1685.

We haven't seen any issues running of this PR branch, but then we haven't specified any significant configuration. I assume the suggestion is to make a more flexible version of this change, to let the user define the volume to configure:

device_name = length(split("BOTTLEROCKET", each.value["ami_type"])) > 1 ? "/dev/xvdb" : "/dev/xvda"

This still means only one volume will be configured, but it seems that will be fine for the Bottlerocket case 🤷🏼

I made the original volume comment on #1695, saying that we shouldn't really care about the xvda volume. For context, xvda contains operating system itself: the active & passive partitions, the bootloader, the dm-verity data and etc. However, I can see the use case where you want to customize both xvda and xvdb. For example, in case you are required to encrypt all your ebs devices.

scalen and others added 3 commits November 30, 2021 15:18
Prior to this commit, user data for node groups was generated from a
prescribed template, and user data in other formats (such as the TOML
used to configure Bottlerocket instances, see link) was not supported.

This change allows a node_group to specify its own user data template
file, the template's extra arguments and the user data's mime type;
this in turn supports alternative forms of user data as required by any
given AMI.

https://github.com/bottlerocket-os/bottlerocket#using-user-data
Prior to this change, node_groups userdata had to be specified as a
cloudinit config file.  However, Bottlerocket OS doesn't support
cloudinit and relies on bare TOML.

This change makes the use of cloudinit config rely on the specification
of a MIME type for the templated userdata file.  Otherwise, the userdata
provided falls back to the raw templated file content.

*N.B.*: This is backwards compatible, as the default `user_data`
specification includes a MIME type (`text/x-shellscript`).
Prior to this change, the examples only showed how to use bottlerocket
with the older worker_nodes configuration.

This change demonstrates the use of node_groups to create a bottlerocket
based cluster.
@scalen scalen force-pushed the node_modules/support-alternative-userdata-formats branch from 443bbe9 to 7281699 Compare November 30, 2021 15:18
@scalen
Copy link
Author

scalen commented Nov 30, 2021

I made the original volume comment on #1695, saying that we shouldn't really care about the xvda volume. For context, xvda contains operating system itself: the active & passive partitions, the bootloader, the dm-verity data and etc. However, I can see the use case where you want to customize both xvda and xvdb. For example, in case you are required to encrypt all your ebs devices.

I thought of that too, but that is really starting to stray far from the point of the PR, which was to add support for alternative user data formats back to node_groups. Making an extra value configurable in order to slightly extend the existing functionality to behave as one might expect for the specific Bottlerocket case is arguably related; I would also argue, however, that extending the module to support the configuration of multiple distinct volumes is a new feature, which should get its own PR.

@stevehipwell
Copy link
Contributor

@hi-artem @scalen the Bottlerocket OS volume is readonly so I think that if this PR supports changing the device name we're good here. Adding support for additional devices should be a new PR, but I think we need to make sure that the right Bottlerocket (or other OS) device is being configured in this PR.

@scalen
Copy link
Author

scalen commented Dec 1, 2021

I think we need to make sure that the right Bottlerocket (or other OS) device is being configured in this PR.

Done, @stevehipwell .

@FrediWeber
Copy link

@hi-artem @scalen the Bottlerocket OS volume is readonly so I think that if this PR supports changing the device name we're good here. Adding support for additional devices should be a new PR, but I think we need to make sure that the right Bottlerocket (or other OS) device is being configured in this PR.

Is the Bottlerocket volume RO from an EC2 standpoint? I thought the volume just gets mounted as RO.

@stevehipwell
Copy link
Contributor

@FrediWeber that's worth exploring. I guess the block_device variable could be made block_devices and take a list and then the mapping be dynamically looped which would support all potential cases? @scalen thoughts?

@FrediWeber
Copy link

We just configured a launch template for a classic ASG and needed to configure both block devices to be encrypted and use a custom KMS key to be compliant with our regulations. I think it would be beneficial if we could encrypt all block devices on the Bottlerocket instances.

@scalen
Copy link
Author

scalen commented Dec 1, 2021

We just configured a launch template for a classic ASG and needed to configure both block devices to be encrypted and use a custom KMS key to be compliant with our regulations. I think it would be beneficial if we could encrypt all block devices on the Bottlerocket instances.

@FrediWeber that's worth exploring. I guess the block_device variable could be made block_devices and take a list and then the mapping be dynamically looped which would support all potential cases? @scalen thoughts?

I think this does sound like a very valuable feature, and is (like the feature being implemented in this PR) replicating a greater configurability from the workers_group elements of the parent module.

However, as far as I can see, the ability to specify multiple device mappings is not really related to the ability to specify non-cloudinit user data... Other than both are features that are related to the deploying of Bottlerocket nodes.

I think it would be more valuable to get this PR approved and merged, thereby unblocking the use of Bottlerocket nodes, then follow it with another PR introducing support for multiple device mappings, to increase security/support regulatory compliance where applicable. That may just be my preference for focused, modular PRs, though 🤷🏼.

@idan-gur
Copy link

idan-gur commented Dec 2, 2021

What is the blockage on this PR?

@stevehipwell
Copy link
Contributor

@daroga0002 @antonbabenko could you approve the workflow even if you don't have time to review yet?

@daroga0002
Copy link
Contributor

@daroga0002 @antonbabenko could you approve the workflow even if you don't have time to review yet?

I dont have permissions and @antonbabenko is on re:Invent so probably he will be back next week.

I am a little stuck in last 2 weeks by some work, but I will try soon perform a review pending PRs.

Prior to this change, the user data templating variables were nested
under a top level `user_data` module variable, so that when specifying
one piece of it, the default behaviour (specifically the default MIME
type) changed.  There was no precedent in the module for this kind of
behaviour.

This change flattens the user data module variables, making them
independent of each other.  This requires a slightly odd behaviour
around the MIME type, which we have to specify as explicitly `""` (or
alternatively provide a slightly verbose additional module flag) in
order to prevent wrapping the user data in a CloudInit packet.
Prior to this change, both node_groups user_data_template_file and
user_data_template_extra_args referred to userdata_template_file from
the parent module, which was wrong.

This change corrects the reference from
node_groups.user_data_template_extra_args to
userdata_template_extra_args.
Prior to this change, the device name was hard coded to `/dev/xvda`,
while Bottlerocket has two volumes: a core OS volume that should be left
as defined by the AMI; and a user-space volume `/dev/xvdb` that _should_
be configurable (but wasn't).

This change allows the specification of the device name for the sole
configurable EBS volume in the managed node group launch template.  This
therefore supports the Bottlerocket usecase (as demonstrated in the
`managed_bottlerocket_node_group` example), as well as any other AMI
with alternative volume naming.
@scalen scalen force-pushed the node_modules/support-alternative-userdata-formats branch from 7281699 to 6da3756 Compare December 2, 2021 18:03
@scalen
Copy link
Author

scalen commented Dec 2, 2021

Tests from before rebase to fix formatting: https://github.com/terraform-aws-modules/terraform-aws-eks/runs/4397905746

@lucasreed
Copy link

Checking in to see if this is in the pipeline for reviewing. Would love to see this released!

@antonbabenko
Copy link
Member

This issue has been resolved in version 18.0.0 🎉

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow alternative user_data format in node_groups