Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): add ability to resolve mount targets using EFS API #392

Merged
merged 5 commits into from
Apr 20, 2021

Conversation

jusiskin
Copy link
Contributor

@jusiskin jusiskin commented Apr 16, 2021

Resolves #391

Taken from that issue:

The RFDK code to mount EFS file-systems on an instance assumes that the instance belongs to a VPC with Amazon-supplied DNS (see VPC DNS documentation). When a VPC is configured to disable the Amazon-supplied DNS resolver, then the DNS names used by both the EFS mount helper and the NFS mounting code cannot be resolved and the mount will fail with an error showing up in the cloud-init log file, such as:

[   32.413572] cloud-init[3739]: Failed to resolve "fs-a1b2c3d4.efs.us-west-2.amazonaws.com" - check that your file system ID is correct.

Solution

The DNS names in the error message above correspond to EFS mount targets. These provide the endpoints that serve the networked file-system traffic. EFS automatically adds DNS entries into your VPC's Amazon-supplied Route 53 resolver, but when this feature is disabled, those names cannot be resolved using DNS.

Fortunately, the IP address of each mount target can be obtained by calling the EFS DescribeMountTargets API.

Added an optional boolean flag when EFS mounting script that will perform this lookup and bake the IP address into the host machine's /etc/hosts file to bypass DNS lookups. Using this looks like:

TypeScript:

new MountableEfs(scope, {
  // ...
  resolveMountTargetDnsWithApi: true,
});

Python:

MountableEfs(scope,
  # ...
  resolve_mount_target_dns_with_api=True,
)

Testing

Modified the All-In-AWS-Infrastructure-Basic example, enabling the above flag. Connected to the mounting instances using SSM Session Manager and confirmed:

  • The /etc/hosts file contains an entry for the mount target:

    10.0.71.79 fs-a1b2c1d2.efs.us-west-2.amazonaws.com # Added by RFDK
    
  • The EFS file-system driver establishes a connection to the specified IP address:

    $ sudo netstat -alnp | grep 10.0.71.79
    tcp        0      0 10.0.0.13:55102         10.0.71.79:2049         ESTABLISHED 2306/stunnel
  • Deployment succeeds

  • Am able to connect to the farm, submit a job, view successful render logs (requires proper EFS mounting on the RenderQueue)


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

if [[ $RESOLVE_MOUNTPOINT_IP_VIA_API == "true" ]]
then
# jq is used to query the JSON API response
sudo "${PACKAGE_MANAGER}" install -y jq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check whether jq is present, and only hit the package manager if it is not. Use-case is an isolated machine with no network connectivity to the package repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I'll add that check first


if [[ $MNT_TARGET_RESOURCE_ID == fs-* ]]; then
# Mounting without an access point
MOUNT_POINT_IP=$(aws efs describe-mount-targets \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing failure handling -- we're not guaranteed that there is a MountTarget in the same AZ as the instance. Route53 would resolve the DNS with a randomly chosen IP in that case.

Same in the fsap-* case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the concern you mentioned about cross-AZ data charges, do you think we should replicate this behavior? Or is it better to fail? I'm learning towards failing with a user-friendly warning over using a random mount target. That way anyone using this could avoid the cross-AZ costs and simply add the missing mount target.

The calling code has a catch-all error handler, but you're right - currently this will not fail gracefully if there's no matching mount target. I'll fix that - thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion... We want to make sure that the customer knows that there could be cross-AZ data charges, but not prevent them from deploying in a cross-AZ manner. Throwing an error for cross-AZ would make a cross-AZ mount impossible, and there may be valid use-cases for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added logic to fall-back to using any available mount target with a warning being output that should get logged to CloudWatch. Let me know what you think.

target.userData.addCommands(
'TMPDIR=$(mktemp -d)',
'pushd "$TMPDIR"',
`unzip ${mountScript}`,
`bash ./mountEfs.sh ${this.props.filesystem.fileSystemId} ${mountDir} ${mountOptionsStr}`,
`bash ./mountEfs.sh ${this.props.filesystem.fileSystemId} ${mountDir} ${mountOptionsStr} ${resolveMountTargetDnsWithApi}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest reversing the ordering of mount options & the new arg. Reason being that mount options could be empty, but the new arg will never be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


set -xeu

if test $# -lt 2
then
echo "Usage: $0 <file system ID> <mount path> [<mount options>]"
echo "Usage: $0 FILE_SYSTEM_ID MOUNT_PATH [MOUNT_OPTIONS] [RESOLVE_MOUNT_POINT_USING_API]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me to make the new arg required. Two optional positional args can lead to confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied

- Make RESOLVE_MOUNT_POINT_USING_API argument required and move before optional mount opts
- Add graceful error-handling when no matching mount target found for instance's availability zone
- userdata runs as root and some AMIs might not have sudo installed
@horsmand horsmand self-requested a review April 19, 2021 19:17
@jusiskin jusiskin added the contribution/core This is a PR that came from AWS. label Apr 19, 2021
Copy link
Contributor

@ddneilson ddneilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit, and then I think we're good.

packages/aws-rfdk/lib/core/scripts/bash/mountEfs.sh Outdated Show resolved Hide resolved
horsmand
horsmand previously approved these changes Apr 20, 2021
Co-authored-by: Daniel Neilson <53624638+ddneilson@users.noreply.github.com>
@horsmand horsmand merged commit 726fa84 into aws:mainline Apr 20, 2021
@jusiskin jusiskin deleted the resolve_efs_mount_targets_via_api branch April 26, 2021 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution/core This is a PR that came from AWS.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to mount EFS in VPCs without Amazon-supplied DNS resolver
3 participants