[QT-525] enos: use spot instances for Vault targets #20037

ryancragun · 2023-04-06T23:10:57Z

The previous strategy for provisioning infrastructure targets was to use
the cheapest instances that could reliably perform as Vault cluster
nodes. With this change we introduce a new model for target node
infrastructure. We've replaced on-demand instances for a spot
fleet. While the spot price fluctuates based on dynamic pricing,
capacity, region, instance type, and platform, cost savings for our
most common combinations range between 20-70%.

This change only includes spot fleet targets for Vault clusters.
We'll be updating our Consul backend bidding in another PR.

Create a new vault_cluster module that handles installation,
configuration, initializing, and unsealing Vault clusters.
Create a target_ec2_instances module that can provision a group of
instances on-demand.
Create a target_ec2_spot_fleet module that can bid on a fleet of
spot instances.
Extend every Enos scenario to utilize the spot fleet target acquisition
strategy and the vault_cluster module.
Update our Enos CI modules to handle both the aws-nuke permissions
and also the privileges to provision spot fleets.
Only use us-east-1 and us-west-2 in our scenario matrices as costs are
lower than us-west-1.

Signed-off-by: Ryan Cragun me@ryan.ec

enos/ci/service-user-iam/service-quotas.tf

The previous strategy for provisioning infrastructure targets was to use the cheapest instances that could reliably perform as Vault cluster nodes. With this change we introduce a new model of bidding for spot fleet instances with the goal of costs savings and often more powerful instances. The spot fleet instance bidding has only been implemented for Vault clusters. Updating our Consul backend bidding will be handled in another PR. * Create a new `vault_cluster` module that handles installation, configuration, initializing, and unsealing Vault clusters. * Create a `target_ec2_instances` module that can provision a group of instances on-demand. * Create a `target_ec2_spot_fleet` module that can bid on a fleet of spot instances. * Extend every Enos scenario to utilize the spot fleet target acquisition strategy and the `vault_cluster` module. * Update our Enos CI modules to handle both the `aws-nuke` permissions and also the privileges to provision spot fleets. Signed-off-by: Ryan Cragun <me@ryan.ec>

Signed-off-by: Ryan Cragun <me@ryan.ec>

jaymalasinha · 2023-04-13T17:33:35Z

enos/enos-scenario-autopilot.hcl

@@ -15,6 +15,12 @@ scenario "autopilot" {
      edition       = ["oss", "ent.fips1402", "ent.hsm.fips1402"]
      artifact_type = ["package"]
    }
+
+    # Our local builder always creates bundles


jaymalasinha · 2023-04-13T17:43:55Z

enos/enos-scenario-autopilot.hcl

+      vpc_id                = step.create_vpc.vpc_id
+    }
+  }
+
  step "upgrade_vault_cluster_with_autopilot" {
    module = module.vault_cluster
    depends_on = [


This should also depend_on create_vault_cluster_upgrade_targets

joshbrand · 2023-04-13T17:29:49Z

enos/enos-modules.hcl

 module "vault_verify_agent_output" {
  source = "./modules/vault_verify_agent_output"

  vault_instance_count = var.vault_instance_count
 }

 module "vault_cluster" {
-  source = "app.terraform.io/hashicorp-qti/aws-vault/enos"
-  # source = "../../terraform-enos-aws-vault"
+  source = "./modules/vault_cluster"


joshbrand · 2023-04-13T17:36:10Z

enos/modules/target_ec2_instances/main.tf

+    ]
+  }
+
+  statement {


😻 I know this sucks but appreciate it (and hopefully sec will too!)

joshbrand · 2023-04-13T17:37:08Z

enos/modules/target_ec2_instances/main.tf

+  tags = merge(
+    var.common_tags,
+    {
+      Name = "${local.name_prefix}-target-instance"


do we need to include an ID for multiple targets here?

jaymalasinha · 2023-04-13T17:57:54Z

enos/enos-scenario-replication.hcl

@@ -333,7 +380,7 @@ scenario "replication" {
    variables {
      primary_leader_public_ip = step.get_primary_cluster_ips.leader_public_ip
      vault_install_dir        = local.vault_install_dir
-      vault_root_token         = step.create_vault_primary_cluster.vault_root_token
+      vault_root_token         = step.create_primary_cluster.root_token


shorter names look great 👍

jaymalasinha

Looks great 🚀 ! Just one minor dependency comment.

The previous strategy for provisioning infrastructure targets was to use the cheapest instances that could reliably perform as Vault cluster nodes. With this change we introduce a new model for target node infrastructure. We've replaced on-demand instances for a spot fleet. While the spot price fluctuates based on dynamic pricing, capacity, region, instance type, and platform, cost savings for our most common combinations range between 20-70%. This change only includes spot fleet targets for Vault clusters. We'll be updating our Consul backend bidding in another PR. * Create a new `vault_cluster` module that handles installation, configuration, initializing, and unsealing Vault clusters. * Create a `target_ec2_instances` module that can provision a group of instances on-demand. * Create a `target_ec2_spot_fleet` module that can bid on a fleet of spot instances. * Extend every Enos scenario to utilize the spot fleet target acquisition strategy and the `vault_cluster` module. * Update our Enos CI modules to handle both the `aws-nuke` permissions and also the privileges to provision spot fleets. * Only use us-east-1 and us-west-2 in our scenario matrices as costs are lower than us-west-1. Signed-off-by: Ryan Cragun <me@ryan.ec>

* [QT-525] enos: use spot instances for Vault targets (#20037) The previous strategy for provisioning infrastructure targets was to use the cheapest instances that could reliably perform as Vault cluster nodes. With this change we introduce a new model for target node infrastructure. We've replaced on-demand instances for a spot fleet. While the spot price fluctuates based on dynamic pricing, capacity, region, instance type, and platform, cost savings for our most common combinations range between 20-70%. This change only includes spot fleet targets for Vault clusters. We'll be updating our Consul backend bidding in another PR. * Create a new `vault_cluster` module that handles installation, configuration, initializing, and unsealing Vault clusters. * Create a `target_ec2_instances` module that can provision a group of instances on-demand. * Create a `target_ec2_spot_fleet` module that can bid on a fleet of spot instances. * Extend every Enos scenario to utilize the spot fleet target acquisition strategy and the `vault_cluster` module. * Update our Enos CI modules to handle both the `aws-nuke` permissions and also the privileges to provision spot fleets. * Only use us-east-1 and us-west-2 in our scenario matrices as costs are lower than us-west-1. Signed-off-by: Ryan Cragun <me@ryan.ec> * [QT-530] enos: allow-list all public IP addresses (#20304) The security groups that allow access to remote machines in Enos scenarios have been configured to only allow port 22 (SSH) from the public IP address of machine executing the Enos scenario. To achieve this we previously utilized the `enos_environment.public_ip_address` attribute. Sometime in mid March we started seeing sporadic SSH i/o timeout errors when attempting to execute Enos resources against SSH transport targets. We've only ever seen this when communicating from Azure hosted runners to AWS hosted machines. While testing we were able to confirm that in some cases the public IP address resolved using DNS over UDP4 to Google and OpenDNS name servers did not match what was resolved when using the HTTPS/TCP IP address service hosted by AWS. The Enos data source was implemented in a way that we'd attempt resolution of a single name server and only attempt resolving from the next if previous name server could not get a result. We'd then allow-list that single IP address. That's a problem if we can resolve two different public IP addresses depending our endpoint address. This change utlizes the new `enos_environment.public_ip_addresses` attribute and subsequent behavior change. Now the data source will attempt to resolve our public IP address via name servers hosted by Google, OpenDNS, Cloudflare, and AWS. We then return a unique set of these IP addresses and allow-list all of them in our security group. It is our hope that this resolves these i/o timeout errors that seem like they're caused by the security group black-holing our attempted access because the IP we resolved does not match what we're actually exiting with. Signed-off-by: Ryan Cragun <me@ryan.ec> --------- Signed-off-by: Ryan Cragun <me@ryan.ec> Co-authored-by: Ryan Cragun <me@ryan.ec>

ryancragun added pr/no-changelog backport/1.11.x pr/no-milestone labels Apr 6, 2023

ryancragun requested a review from a team as a code owner April 6, 2023 23:10

ryancragun commented Apr 6, 2023

View reviewed changes

enos/ci/service-user-iam/service-quotas.tf Show resolved Hide resolved

ryancragun force-pushed the qt-525 branch from 47e1522 to 23f5b16 Compare April 7, 2023 19:18

use us-east-1 instead of us-west-1 because it's cheaper

e3862c5

Signed-off-by: Ryan Cragun <me@ryan.ec>

ryancragun enabled auto-merge (squash) April 7, 2023 20:23

jaymalasinha reviewed Apr 13, 2023

View reviewed changes

joshbrand reviewed Apr 13, 2023

View reviewed changes

jaymalasinha reviewed Apr 13, 2023

View reviewed changes

jaymalasinha approved these changes Apr 13, 2023

View reviewed changes

ryancragun merged commit 1329a6b into main Apr 13, 2023

This was referenced Apr 13, 2023

Backport of [QT-525] and [QT-530] into release/1.13.x #20158

Merged

Backport of [QT-525] and [QT-530] into release/1.11.x #20160

Merged

Backport of [QT-525] and [QT-530] into release/1.12.x #20161

Merged

ryancragun deleted the qt-525 branch April 19, 2024 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QT-525] enos: use spot instances for Vault targets #20037

[QT-525] enos: use spot instances for Vault targets #20037

ryancragun commented Apr 6, 2023 •

edited

Loading

jaymalasinha Apr 13, 2023

jaymalasinha Apr 13, 2023

joshbrand Apr 13, 2023

joshbrand Apr 13, 2023

joshbrand Apr 13, 2023

jaymalasinha Apr 13, 2023

jaymalasinha left a comment

[QT-525] enos: use spot instances for Vault targets #20037

[QT-525] enos: use spot instances for Vault targets #20037

Conversation

ryancragun commented Apr 6, 2023 • edited Loading

jaymalasinha Apr 13, 2023

Choose a reason for hiding this comment

jaymalasinha Apr 13, 2023

Choose a reason for hiding this comment

joshbrand Apr 13, 2023

Choose a reason for hiding this comment

joshbrand Apr 13, 2023

Choose a reason for hiding this comment

joshbrand Apr 13, 2023

Choose a reason for hiding this comment

jaymalasinha Apr 13, 2023

Choose a reason for hiding this comment

jaymalasinha left a comment

Choose a reason for hiding this comment

ryancragun commented Apr 6, 2023 •

edited

Loading