feat(ec2): multipart user data #11843

rsmogura · 2020-12-02T20:45:24Z

Add support for multiparat (MIME) user data for Linux environments. This
type is more versatile type of user data, and some AWS service
(i.e. AWS Batch) requires it in order to customize the launch
behaviour.

Change was tested in integ environment to check if all
user data parts has been executed correctly and with proper charset
encoding.

fixes #8315

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

gitpod-io · 2020-12-02T20:45:28Z

rsmogura · 2020-12-03T15:01:14Z

I think this has to be work in progress for a moment (soft review is welcome) - I analysed integration test results and the approach with auto generating boundary is not perfect when the token is used inside part body, it can lead to different hashes used as boundary and change of user data in turn.

rix0rrr · 2020-12-14T10:56:57Z

I don't really understand the design.

Why isn't it just a self-contained MimeUserData extends UserData class? Why do we have to turn original UserData class into an IMultipartUserDataPartProducer ?

And generally the boundary is just a large pseudorandom number which we'll assume won't occur in the various parts. Getting a hash from the part contents seems unnecessarily complicated?

Let's start by defining what parts there are. I assume you'll have a "commands" part (the "old" UserData, so to speak) and then a bunch of "attachments", right? Which are basically just files to stick somewhere on the filesystem?

rsmogura · 2020-12-14T18:38:22Z

SeeHi Ricco,

I got your point. I'll try to clarify some "whys" so maybe it will be better understable and we could go with this or something better.

Let me start from last part.

Multipart User Data is more like archive than a message with attachments.

Each script is a separate part (attachment), executed during different phases. Many scripts can be executed for the same phase.

(There's more kinds of attachments than script types).

Thus I thought that the Multipart has to inherit from UserData, but it doesn't allow adding commands (to what part or type? it's just archive)

In order to add parts and use existing classes (LinuxUserData)
there's IMultipartUserDataPartProducer. Part requires additional attributes to be rendered (like type), besides body. And there's at least two hooks where scripts can be added.

This allows reusing the current command like user data for Multipart user data.

Let me give a use case:
In Multipart user data I want to have two part with following content-types

cloud-boothook - to reconfigure docker and increase size of docker volume
x-shellscript - to I.e. register in ECS or system manager

So in this case Multipart will be archive for shell scripts, executed by cloud unit.

(Worth of mentioning but not implementing now is that there's more types supported by cloud init: like part handlers, include url, and more - in future every such type could get own class implementing IMultipartUserDataPartProducer)

So that's very nice to have good design review.

rix0rrr · 2021-01-06T14:57:43Z

This allows reusing the current command like user data for Multipart user data.

I see. I don't mind that at all, I think you are right. But then we still have a couple of choices:

UserData implements IMultipart

The different UserData implementations that can be reused implement IMultipart

There's an integration class that implements IMultipart

Out of these, I think the last one is most self-contained, so has my preference. There might be some syntactic overhead there, but we can hide that with convenience methods:

multipart.addUserDataPart(new LinuxUserData());

class MultipartUserData {
  public addPart(part: IMultipart) { ... }

  public addUserDataPart(userData: UserData) {
    this.addPart(new UserDataMultipart(userData));
  }
}

That way we don't need to pollute the existing UserData classes with multipart-specifics.

rsmogura · 2021-01-07T07:58:07Z

Agree. The last idea is really good, instead of bloating code with interfaces just use adaptor.

And later we could think how to solve separator issue - however fallback will be to specify separator as option to Multipart.

rsmogura · 2021-01-21T17:03:42Z

Hi @rix0rrr can you check now?

rix0rrr

I like where this is going!

I have some requests around naming and simplifying the implementation a little, but otherwise great job!

Also looking for a real life example of using more than 1 part :)

packages/@aws-cdk/aws-ec2/lib/user-data.ts

rsmogura · 2021-02-17T08:04:32Z

Thank you. And sorry for late answer I missed notification.

I'll take a look at it.

I think real life example could be setting size for Docker container disk in batch environment:

https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-failure-disk-space/

yashda · 2021-02-25T16:25:14Z

We have a similar requirement which is another real life example of using more than 1 part:

Configure the output/log file for cloud-init using cloud-config directives on CentOS images.
Execute user-data script(bash) on startup of the Instance.

These changes would help us a lot in simplifying the user-data handling.

rsmogura · 2021-03-01T22:42:29Z

I want to test it with Windows machines, too. I'm bit concerned about line ending characters.

Add support for multiparat (MIME) user data for Linux environments. This type is more versatile type of user data, and some AWS service (i.e. AWS Batch) requires it in order to customize the launch behaviour. Change was tested in integ environment to check if all user data parts has been executed correctly and with proper charset encoding. fixes aws#8315

* Remove `IMultipartUserDataPartProducer` * Add `MultipartUserDataPart` & `IMultipart` * Concrete types to represent raw part and UserData wrapper can be created with `MultipartUserDataPart.fromUserData` & `MultipartUserDataPart.fromRawBody` * Removed auto-generation of separator (as with tokens hash codes can differ when tokens are not resolved)

- remove `MultipartContentType` - remove `MultipartUserDataPartWrapperOptions` - remove `IMultipart` - rename `MultipartUserDataPart` -> `MultipartBody` - other removals - restructure other classes - moved part rendering to part class - set default separator to hard codeded string - added validation of boundry

Pull request has been modified.

rsmogura · 2021-03-04T19:12:44Z

Hi @rix0rrr

I made a bigger refactor of code, I removed few methods, and followed tips. I think it's more clean right now, please check.

In context of Windows machines, I think multipart is not supported for Windows (at least I could not run any kind of multipart data on Windows, and documentation as well does not mention that multipart is supported for Windows EC2).

Minor thing which concerns me. MIME RFC as it's telnet based suggests CRLF as new line, however cloud-init is fine with \n (I made number of tests here, however I think it's worth to call it out)

rix0rrr

Thanks! This is looking great. Just some small tweaks.

rix0rrr · 2021-03-08T12:57:45Z

packages/@aws-cdk/aws-ec2/lib/user-data.ts

+  protected static readonly DEFAULT_CONTENT_TYPE = 'text/x-shellscript; charset="utf-8"';
+
+  /** The body of this MIME part. */
+  public abstract get body(): string | undefined;


I think I'd prefer you don't declare these abstract members, and just implement renderBodyPart() twice--once for the two concrete types of classes.

It may feel like unnecessary duplication, but the actual amount of duplication won't be that bad and we'll have a good reduction in case analysis too (fewer ifs).

Now I got the point. I thought that main concern was about options interfaces, so I moved towards abstracts getters and template method pattern.

Thanks, that's a good comment.

rix0rrr · 2021-03-08T12:58:40Z

packages/@aws-cdk/aws-ec2/lib/user-data.ts

+
+    return this;
+  }
+


Why get rid of the addUserDataPart(userData, contentType) here? I rather liked it as a convenience method.

(Of course it's not strictly necessary, but it reads nicer than the current alternative)

Pull request has been modified.

mergify · 2021-03-08T13:52:51Z

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

aws-cdk-automation · 2021-03-08T16:32:55Z

AWS CodeBuild CI Report

CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
Commit ID: bc6e06c
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mergify · 2021-03-08T16:33:21Z

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

github-actions bot assigned rix0rrr Dec 2, 2020

github-actions bot added the @aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud label Dec 2, 2020

rix0rrr previously requested changes Feb 9, 2021

View reviewed changes

Radek Smogura and others added 4 commits March 4, 2021 19:47

Add readme

a013138

rsmogura force-pushed the issue-8315 branch from 68a350d to f50d10b Compare March 4, 2021 19:04

Fix wording and spelling in Readme

5338f1d

rix0rrr previously requested changes Mar 8, 2021

View reviewed changes

Small simplifications

2563648

rix0rrr changed the title ~~feat(ec2): introduce multipart user data~~ feat(ec2): multipart user data Mar 8, 2021

rix0rrr approved these changes Mar 8, 2021

View reviewed changes

Merge branch 'master' into issue-8315

bc6e06c

rsmogura mentioned this pull request Apr 20, 2021

Enable adding custom user data to ECS cluster #1711

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ec2): multipart user data #11843

feat(ec2): multipart user data #11843

rsmogura commented Dec 2, 2020

gitpod-io bot commented Dec 2, 2020 •

edited

Loading

rsmogura commented Dec 3, 2020

rix0rrr commented Dec 14, 2020

rsmogura commented Dec 14, 2020 •

edited

Loading

rix0rrr commented Jan 6, 2021 •

edited

Loading

rsmogura commented Jan 7, 2021

rsmogura commented Jan 21, 2021

rix0rrr left a comment

rsmogura commented Feb 17, 2021

yashda commented Feb 25, 2021

rsmogura commented Mar 1, 2021

rsmogura commented Mar 4, 2021 •

edited

Loading

rix0rrr left a comment

rix0rrr Mar 8, 2021

rsmogura Mar 11, 2021

rix0rrr Mar 8, 2021

mergify bot commented Mar 8, 2021

aws-cdk-automation commented Mar 8, 2021

mergify bot commented Mar 8, 2021

feat(ec2): multipart user data #11843

feat(ec2): multipart user data #11843

Conversation

rsmogura commented Dec 2, 2020

gitpod-io bot commented Dec 2, 2020 • edited Loading

rsmogura commented Dec 3, 2020

rix0rrr commented Dec 14, 2020

rsmogura commented Dec 14, 2020 • edited Loading

rix0rrr commented Jan 6, 2021 • edited Loading

UserData implements IMultipart

The different UserData implementations that can be reused implement IMultipart

There's an integration class that implements IMultipart

rsmogura commented Jan 7, 2021

rsmogura commented Jan 21, 2021

rix0rrr left a comment

Choose a reason for hiding this comment

rsmogura commented Feb 17, 2021

yashda commented Feb 25, 2021

rsmogura commented Mar 1, 2021

rsmogura commented Mar 4, 2021 • edited Loading

rix0rrr left a comment

Choose a reason for hiding this comment

rix0rrr Mar 8, 2021

Choose a reason for hiding this comment

rsmogura Mar 11, 2021

Choose a reason for hiding this comment

rix0rrr Mar 8, 2021

Choose a reason for hiding this comment

mergify bot commented Mar 8, 2021

aws-cdk-automation commented Mar 8, 2021

AWS CodeBuild CI Report

mergify bot commented Mar 8, 2021

gitpod-io bot commented Dec 2, 2020 •

edited

Loading

rsmogura commented Dec 14, 2020 •

edited

Loading

rix0rrr commented Jan 6, 2021 •

edited

Loading

rsmogura commented Mar 4, 2021 •

edited

Loading