Skip to content

Commit

Permalink
Backport of [QT-525] and [QT-530] into release/1.13.x (#20158)
Browse files Browse the repository at this point in the history
* [QT-525] enos: use spot instances for Vault targets (#20037)

The previous strategy for provisioning infrastructure targets was to use
the cheapest instances that could reliably perform as Vault cluster
nodes. With this change we introduce a new model for target node
infrastructure. We've replaced on-demand instances for a spot
fleet. While the spot price fluctuates based on dynamic pricing, 
capacity, region, instance type, and platform, cost savings for our
most common combinations range between 20-70%.

This change only includes spot fleet targets for Vault clusters.
We'll be updating our Consul backend bidding in another PR.

* Create a new `vault_cluster` module that handles installation,
  configuration, initializing, and unsealing Vault clusters.
* Create a `target_ec2_instances` module that can provision a group of
  instances on-demand.
* Create a `target_ec2_spot_fleet` module that can bid on a fleet of
  spot instances.
* Extend every Enos scenario to utilize the spot fleet target acquisition
  strategy and the `vault_cluster` module.
* Update our Enos CI modules to handle both the `aws-nuke` permissions
  and also the privileges to provision spot fleets.
* Only use us-east-1 and us-west-2 in our scenario matrices as costs are
  lower than us-west-1.

Signed-off-by: Ryan Cragun <me@ryan.ec>

* [QT-530] enos: allow-list all public IP addresses (#20304)

The security groups that allow access to remote machines in Enos
scenarios have been configured to only allow port 22 (SSH) from the
public IP address of machine executing the Enos scenario. To achieve
this we previously utilized the `enos_environment.public_ip_address`
attribute. Sometime in mid March we started seeing sporadic SSH i/o
timeout errors when attempting to execute Enos resources against SSH
transport targets. We've only ever seen this when communicating from
Azure hosted runners to AWS hosted machines.

While testing we were able to confirm that in some cases the public IP
address resolved using DNS over UDP4 to Google and OpenDNS name servers
did not match what was resolved when using the HTTPS/TCP IP address
service hosted by AWS. The Enos data source was implemented in a way
that we'd attempt resolution of a single name server and only attempt
resolving from the next if previous name server could not get a result.
We'd then allow-list that single IP address. That's a problem if we can
resolve two different public IP addresses depending our endpoint address.

This change utlizes the new `enos_environment.public_ip_addresses`
attribute and subsequent behavior change. Now the data source will
attempt to resolve our public IP address via name servers hosted by
Google, OpenDNS, Cloudflare, and AWS. We then return a unique set of
these IP addresses and allow-list all of them in our security group. It
is our hope that this resolves these i/o timeout errors that seem like
they're caused by the security group black-holing our attempted access
because the IP we resolved does not match what we're actually exiting
with.

Signed-off-by: Ryan Cragun <me@ryan.ec>

---------

Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
  • Loading branch information
1 parent 5b6b8fa commit 05ca6d0
Show file tree
Hide file tree
Showing 25 changed files with 2,207 additions and 515 deletions.
10 changes: 5 additions & 5 deletions .github/enos-run-matrices/build-github-oss-linux-amd64-zip.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"include": [
{
"scenario": "smoke backend:raft consul_version:1.14.2 distro:ubuntu seal:shamir arch:amd64 artifact_source:crt edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 3
},
{
Expand All @@ -12,7 +12,7 @@
},
{
"scenario": "smoke backend:consul consul_version:1.14.2 distro:ubuntu seal:shamir arch:amd64 artifact_source:crt edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 1
},
{
Expand All @@ -22,7 +22,7 @@
},
{
"scenario": "smoke backend:consul consul_version:1.12.7 distro:ubuntu seal:shamir arch:amd64 artifact_source:crt edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 2
},
{
Expand All @@ -32,7 +32,7 @@
},
{
"scenario": "upgrade backend:raft consul_version:1.14.2 distro:ubuntu seal:shamir arch:amd64 artifact_source:crt edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 5
},
{
Expand All @@ -42,7 +42,7 @@
},
{
"scenario": "upgrade backend:consul consul_version:1.13.4 distro:ubuntu seal:shamir arch:amd64 artifact_source:crt edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 2
},
{
Expand Down
10 changes: 5 additions & 5 deletions .github/enos-run-matrices/build-github-oss-linux-arm64-zip.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
},
{
"scenario": "smoke backend:raft consul_version:1.14.2 distro:ubuntu seal:awskms arch:arm64 artifact_source:crt edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 2
},
{
Expand All @@ -17,7 +17,7 @@
},
{
"scenario": "smoke backend:consul consul_version:1.14.2 distro:ubuntu seal:shamir arch:arm64 artifact_source:crt edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 4
},
{
Expand All @@ -27,7 +27,7 @@
},
{
"scenario": "upgrade backend:raft consul_version:1.14.2 distro:ubuntu seal:shamir arch:arm64 artifact_source:crt edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 1
},
{
Expand All @@ -37,7 +37,7 @@
},
{
"scenario": "upgrade backend:consul consul_version:1.12.7 distro:rhel seal:awskms arch:arm64 artifact_source:crt edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 3
},
{
Expand All @@ -47,7 +47,7 @@
},
{
"scenario": "upgrade backend:consul consul_version:1.14.2 distro:rhel seal:awskms arch:arm64 artifact_source:crt edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 5
}
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"include": [
{
"scenario": "smoke backend:raft consul_version:1.14.2 distro:ubuntu seal:shamir arch:amd64 artifact_source:artifactory edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 2
},
{
Expand All @@ -12,7 +12,7 @@
},
{
"scenario": "smoke backend:consul consul_version:1.14.2 distro:ubuntu seal:shamir arch:amd64 artifact_source:artifactory edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 2
},
{
Expand All @@ -22,7 +22,7 @@
},
{
"scenario": "smoke backend:consul consul_version:1.12.7 distro:ubuntu seal:shamir arch:amd64 artifact_source:artifactory edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 2
},
{
Expand All @@ -32,7 +32,7 @@
},
{
"scenario": "upgrade backend:raft consul_version:1.14.2 distro:ubuntu seal:shamir arch:amd64 artifact_source:artifactory edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 2
},
{
Expand All @@ -42,7 +42,7 @@
},
{
"scenario": "upgrade backend:consul consul_version:1.13.4 distro:ubuntu seal:shamir arch:amd64 artifact_source:artifactory edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 2
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,17 @@
},
{
"scenario": "smoke backend:raft consul_version:1.14.2 distro:ubuntu seal:awskms arch:amd64 artifact_source:artifactory edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 2
},
{
"scenario": "smoke backend:consul consul_version:1.12.7 distro:ubuntu seal:shamir arch:amd64 artifact_source:artifactory edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 1
},
{
"scenario": "smoke backend:consul consul_version:1.14.2 distro:ubuntu seal:shamir arch:amd64 artifact_source:artifactory edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 2
},
{
Expand All @@ -27,7 +27,7 @@
},
{
"scenario": "upgrade backend:raft consul_version:1.14.2 distro:ubuntu seal:shamir arch:amd64 artifact_source:artifactory edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 2
},
{
Expand All @@ -42,7 +42,7 @@
},
{
"scenario": "upgrade backend:consul consul_version:1.13.4 distro:ubuntu seal:shamir arch:amd64 artifact_source:artifactory edition:oss artifact_type:bundle",
"aws_region": "us-west-1",
"aws_region": "us-east-1",
"test_group": 1
},
{
Expand Down
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,10 @@ enos/.terraform/*
enos/.terraform.lock.hcl
enos/*.tfstate
enos/*.tfstate.*
enos/**/.terraform/*
enos/**/.terraform.lock.hcl
enos/**/*.tfstate
enos/**/*.tfstate.*

.DS_Store
.idea
Expand Down Expand Up @@ -127,4 +131,4 @@ website/components/node_modules
.releaser/
*.log

tools/godoctests/.bin
tools/godoctests/.bin
71 changes: 65 additions & 6 deletions enos/ci/service-user-iam/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ resource "aws_iam_role" "role" {

data "aws_iam_policy_document" "assume_role_policy_document" {
provider = aws.us_east_1

statement {
effect = "Allow"
actions = ["sts:AssumeRole"]
Expand All @@ -43,31 +44,75 @@ resource "aws_iam_role_policy" "role_policy" {
provider = aws.us_east_1
role = aws_iam_role.role.name
name = "${local.service_user}_policy"
policy = data.aws_iam_policy_document.iam_policy_document.json
policy = data.aws_iam_policy_document.role_policy.json
}

data "aws_iam_policy_document" "role_policy" {
source_policy_documents = [
data.aws_iam_policy_document.enos_scenario.json,
data.aws_iam_policy_document.aws_nuke.json,
]
}

data "aws_iam_policy_document" "aws_nuke" {
provider = aws.us_east_1

statement {
effect = "Allow"
actions = [
"ec2:DescribeInternetGateways",
"ec2:DescribeNatGateways",
"ec2:DescribeRegions",
"ec2:DescribeVpnGateways",
"iam:DeleteAccessKey",
"iam:DeleteUser",
"iam:DeleteUserPolicy",
"iam:GetUser",
"iam:ListAccessKeys",
"iam:ListAccountAliases",
"iam:ListGroupsForUser",
"iam:ListUserPolicies",
"iam:ListUserTags",
"iam:ListUsers",
"iam:UntagUser",
"servicequotas:ListServiceQuotas"
]

resources = ["*"]
}
}

data "aws_iam_policy_document" "iam_policy_document" {
data "aws_iam_policy_document" "enos_scenario" {
provider = aws.us_east_1

statement {
effect = "Allow"
actions = [
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CancelSpotFleetRequests",
"ec2:CancelSpotInstanceRequests",
"ec2:CreateInternetGateway",
"ec2:CreateKeyPair",
"ec2:CreateLaunchTemplate",
"ec2:CreateLaunchTemplateVersion",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSpotDatafeedSubscription",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:CreateVPC",
"ec2:DeleteInternetGateway",
"ec2:DeleteLaunchTemplate",
"ec2:DeleteLaunchTemplateVersions",
"ec2:DeleteKeyPair",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSpotDatafeedSubscription",
"ec2:DeleteSubnet",
"ec2:DeleteTags",
"ec2:DeleteVolume",
Expand All @@ -81,14 +126,22 @@ data "aws_iam_policy_document" "iam_policy_document" {
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInternetGateways",
"ec2:DescribeInternetGateways",
"ec2:DescribeKeyPairs",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeNatGateways",
"ec2:DescribeNetworkAcls",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeRegions",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSpotDatafeedSubscription",
"ec2:DescribeSpotFleetInstances",
"ec2:DescribeSpotFleetInstanceRequests",
"ec2:DescribeSpotFleetRequests",
"ec2:DescribeSpotFleetRequestHistory",
"ec2:DescribeSpotInstanceRequests",
"ec2:DescribeSpotPriceHistory",
"ec2:DescribeSubnets",
"ec2:DescribeTags",
"ec2:DescribeVolumes",
Expand All @@ -99,14 +152,21 @@ data "aws_iam_policy_document" "iam_policy_document" {
"ec2:DescribeVpnGateways",
"ec2:DetachInternetGateway",
"ec2:DisassociateRouteTable",
"ec2:GetLaunchTemplateData",
"ec2:GetSpotPlacementScores",
"ec2:ImportKeyPair",
"ec2:ModifyInstanceAttribute",
"ec2:ModifyLaunchTemplate",
"ec2:ModifySpotFleetRequest",
"ec2:ModifySubnetAttribute",
"ec2:ModifyVPCAttribute",
"ec2:RequestSpotInstances",
"ec2:RequestSpotFleet",
"ec2:ResetInstanceAttribute",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances",
"ec2:SendSpotInstanceInterruptions",
"ec2:TerminateInstances",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeTargetGroups",
Expand All @@ -115,11 +175,10 @@ data "aws_iam_policy_document" "iam_policy_document" {
"iam:CreateInstanceProfile",
"iam:CreatePolicy",
"iam:CreateRole",
"iam:CreateRole",
"iam:CreateServiceLinkedRole",
"iam:DeleteInstanceProfile",
"iam:DeletePolicy",
"iam:DeleteRole",
"iam:DeleteRole",
"iam:DeleteRolePolicy",
"iam:DetachRolePolicy",
"iam:GetInstanceProfile",
Expand All @@ -132,7 +191,6 @@ data "aws_iam_policy_document" "iam_policy_document" {
"iam:ListPolicies",
"iam:ListRolePolicies",
"iam:ListRoles",
"iam:ListRoles",
"iam:PassRole",
"iam:PutRolePolicy",
"iam:RemoveRoleFromInstanceProfile",
Expand All @@ -150,6 +208,7 @@ data "aws_iam_policy_document" "iam_policy_document" {
"kms:ScheduleKeyDeletion",
"servicequotas:ListServiceQuotas"
]

resources = ["*"]
}
}
Loading

0 comments on commit 05ca6d0

Please sign in to comment.