Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provider/aws: Opsworks Stacks and Layers #2162

Merged
merged 4 commits into from
Oct 6, 2015
Merged

provider/aws: Opsworks Stacks and Layers #2162

merged 4 commits into from
Oct 6, 2015

Conversation

apparentlymart
Copy link
Contributor

These changes represent intiial support for AWS OpsWorks, starting with the Stack and Layer objects.

OpsWorks is fundamentally a tool for managing the deployment of applications using Chef.

Stack is the root concept for AWS opsworks, acting as a container for the other concepts. A stack usually represents one product running in a particular region.

A layer represents a particular product component, such as an application server, a database server, a cache server, etc. Users can either use predefined layer types, for which AWS provides an initial set of Chef recipes, or "custom" layers where all Chef configuration is provided by the user.

  • Stacks
  • Layers
  • Acceptance Tests
  • Docs
  • Layer EBS volumes

This change includes a separate resource type for each OpsWorks layer type. Although the layer types all share a base set of properties, most of them have extra properties and unique validation rules that make them awkward to represent as a single resource type. In the interests of usability, the layer resource types follow (as closely as possible) the structures used in the OpsWorks UI.

This is a combination of my own work and the work of @nabeken. In particular, @nabeken wrote the acceptance tests and noticed some cases where workarounds are needed for asynchronous work done by OpsWorks that isn't visible in the API. There are notes about these cases in comments in the stack code.

The total diff of this merge request is pretty large, but it's split into a logical sequence of commits to permit step-by-step review, in case that makes things easier.

ServiceRoleARN: aws.String(d.Get("service_role_arn").(string)),
VPCID: aws.String(d.Get("vpc_id").(string)),
DefaultSubnetID: aws.String(d.Get("default_subnet_id").(string)),
DefaultAvailabilityZone: aws.String(d.Get("default_availability_zone").(string)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPCID, DefaultSubnetID and DefaultAvailabilityZone should not be filled here because we set these values below.

After fixing this, acc tests for stack will be all green.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commits.

@nabeken
Copy link
Contributor

nabeken commented Jun 7, 2015

@apparentlymart Thanks for pushing your work and sorry for being late.

You can add links to the document from https://github.com/hashicorp/terraform/blob/master/website/source/layouts/aws.erb

For aws_opsworks_stack and its acc tests, it works well so LGTM.
For aws_opsworks_custom_layer, it works except ebs_volume.

Anyway, I can't wait to see landing this PR:+1:

resp, err := client.DescribeLayers(req)
if err != nil {
if awserr, ok := err.(awserr.Error); ok {
if awserr.Code() == "ResourceNotFound" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we should use ResourceNotFoundException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commits.

@apparentlymart
Copy link
Contributor Author

Thanks for the review, @nabeken.

The new set of commits represents a rebase onto the latest master (including the AWS library move from "awslabs" to "aws") along with responding to all feedback except the set hashing for ebs volumes.

I will address the EBS volume issue soon.

@apparentlymart
Copy link
Contributor Author

I see the bug where changing properties of the ebs volumes (except mount_point) results in an empty diff, but I don't yet understand why.

When I reading the diffing code I see that there is a code path for deep diffing inside Set members which should handle this case. I understood that the set key was just to help terraform understand the difference between a modified ebs_volume and a new ebs_volume, but the other properties should still be changable.

I'm sure your more complex hash function works around the problem, but it doesn't feel like the hash should need to be this complicated since Terraform should already be able to diff within the objects. I will debug some more to see if I can understand why the diffing code is not behaving the way I expected.

@apparentlymart
Copy link
Contributor Author

I created #2336 to see whether the set behavior is intended or a bug. I will update the ebs_volume Set implementation here if my patch there is not accepted.

@apparentlymart
Copy link
Contributor Author

Now that there is a chef provisioner, it would make sense for the chef-related features in the layer resource to use consistent naming and types so that users can re-use their chef-related knowledge in both places.

In most cases that is just a matter of naming but the provisioner also manages to make the "custom JSON" (in that case, called "attributes") a real inline map in the Terraform config. I wanted to do this before but didn't see a reasonable way to express that in the schema system. Should try harder to make that possible so we can get away from embedding a literal JSON string inside the layer config.

@apparentlymart
Copy link
Contributor Author

Per the conclusion of #2336, going to add a default implementation of a schema set function that uses the schema itself to automatically has the entire data structure. Then opsworks layers won't need to provide a hashing implementation for ebs volumes anymore.

@apparentlymart
Copy link
Contributor Author

After getting distracted by other things for a while I'm now working on this again. The auto-hashing thing I noted in an earlier comment is in #3018.

@apparentlymart
Copy link
Contributor Author

I think this patch is now ready to be reviewed except for blocking on the resolution of #3018. If that gets accepted then this should pass tests as soon as it's rebased onto the new master.

@radeksimko
Copy link
Member

Thanks for working on this @apparentlymart

I realise I may be getting quite late to the party - sorry about that, but I was wondering if there's any specific reason why not make aws_opsworks_layer.type instead of

aws_opsworks_stack
aws_opsworks_java_app_layer
aws_opsworks_haproxy_layer
aws_opsworks_static_web_layer
aws_opsworks_php_app_layer
aws_opsworks_rails_app_layer
aws_opsworks_nodejs_app_layer
aws_opsworks_memcached_layer
aws_opsworks_mysql_layer
aws_opsworks_ganglia_layer
aws_opsworks_custom_layer

? I reckon the list of different layers will grow and I'm not sure it's worth the effort of maintaining a separate resource for each one, but maybe I'm missing an important point/reason why it's a good idea?

@apparentlymart
Copy link
Contributor Author

@radeksimko I tried to justify that in the original writeup: although on the backend they are all modeled as one first-class API type, there is some logic hidden in their implementation about what set of keys are valid in "attributes" for each type, and what values are valid for each, etc.

So while the Terraform UI could've been to have the user puzzle out which attributes are valid for each resource -- something which I only figured out by experimentation myself, incidentally -- I thought it was better to follow the structure of the Opsworks UI, where the unique attributes of each type are presented as their own unique form that guides the user to make the right selections.

In general I always think it's a better Terraform UX to mimic the AWS UI rather than it's rather ugly API, since at least in my world users are translating their experience manually configuring things in the UI onto setting things up in Terraform.

I certainly could collapse this whole thing into one type and just let users figure it out for themselves, but given that adding new resource types is just a matter of setting up another set of attribute mappings I thought this was a decent compromise.

@mrjefftang
Copy link
Contributor

+1

I am blocked on a project awaiting for this PR to land.

@josephholsten
Copy link
Contributor

@apparentlymart looks like the only thing remaining for this is to get it passing go vet, got any time for this or would you like a hand?

@apparentlymart
Copy link
Contributor Author

Actually it looks to me like this needs updating for the various breaking changes in the AWS go library... an expected danger of having such a long-running branch, I guess.

I'll see about fixing this up when I get some time, but it's unlikely to be this week so @josephholsten if you want to finish this up then go right ahead!

@phinze
Copy link
Contributor

phinze commented Oct 5, 2015

Okay the dependent set function PR has landed, just need another pass to get it building on the latest version and we can 🚢!

@nabeken
Copy link
Contributor

nabeken commented Oct 5, 2015

🤘

There are several AWS services that are global in scope and thus need to
be accessed via the us-east-1 endpoints, so we'll make the us-east-1
variant of the config available as a variable we can reuse between multiple
clients as we add support for new services.
Here we add an OpsWorks client instance to the central client bundle and
establish a new documentation section, both of which will be fleshed out in
subsequent commits that add some OpsWorks resources.
"Stack" is the root concept in OpsWorks, and acts as a container for a number
of different "layers" that each provide some service for an application.
A stack isn't very interesting on its own, but it needs to be created before
any layers can be created.
A "Layer" is a particular service that forms part of the infrastructure for
a set of applications. Some layers are application servers and others are
pure infrastructure, like MySQL servers or load balancers.

Although the AWS API only has one type called "Layer", it actually has
a number of different "soft" types that each have slightly different
validation rules and extra properties that are packed into the Attributes
map.

To make the validation rule differences explicit in Terraform, and to make
the Terraform structure more closely resemble the OpsWorks UI than its
API, we use a separate resource type per layer type, with the common code
factored out into a shared struct type.
@apparentlymart
Copy link
Contributor Author

I rebased this and ran the unit tests. I don't have my credentials handy right now to run the acceptance tests and it's been quite a while since I last ran them so I should probably do a final pass of that before I call this done. Will try to find a little time for that over the next few days. (If someone else wants to try it then go right ahead! 😀)

@nabeken
Copy link
Contributor

nabeken commented Oct 6, 2015

@apparentlymart I did the acceptance tests. Here are the results:

$ export AWS_REGION=us-west-2
$ make testacc TEST=./builtin/providers/aws TESTARGS='-run=OpsworksStack'
--- PASS: TestAccAwsOpsworksStackNoVpc (19.10s)
--- PASS: TestAccAwsOpsworksStackVpc (86.94s)
PASS
ok      github.com/hashicorp/terraform/builtin/providers/aws    106.059s

$ make testacc TEST=./builtin/providers/aws TESTARGS='-run=OpsworksCustomLayer'
--- PASS: TestAccAwsOpsworksCustomLayer (19.88s)
PASS
ok      github.com/hashicorp/terraform/builtin/providers/aws    19.896s

@phinze
Copy link
Contributor

phinze commented Oct 6, 2015

Looking good! 🚢

phinze added a commit that referenced this pull request Oct 6, 2015
@phinze phinze merged commit 59c414d into hashicorp:master Oct 6, 2015
@brutuscat
Copy link

Thanks! @apparentlymart for such a great work! I was looking at the docs at https://www.terraform.io/docs/providers/aws/r/opsworks_php_app_layer.html and there's no mention of "attaching" things to the layer, what's the procedure to Attach an ELB to the layer?
Is this on the plans?

@steveh
Copy link
Contributor

steveh commented Nov 4, 2015

@brutuscat it's been requested but not done it appears - see the google doc at #28

@apparentlymart
Copy link
Contributor Author

@brutuscat that isn't supported by what was merged already, unfortunately. I had originally intended to build out the full suite of opsworks resources but sadly we (my company) moved away from using Opsworks before the first few resources were merged, and so I wasn't able to complete anything else. 😞

For that AttachElasticLoadBalancer endpoint I think the corresponding resource wouldn't be too complex to implement. Something like this:

resource "aws_opsworks_layer_elb_attachment" {
    elb_name = "${aws_elb.foo.name}"
    layer_id = "${aws_opsworks_php_layer.id}"
}

I'm knee-deep in a bunch of internal work stuff so I'm not able to focus much time on Terraform for a while, but I'll put this on my list to try to circle back to eventually. However, I think this change could be a good first patch for a new Terraform contributor if someone else had the time and motivation to do it. I'm happy to support and review!

@brutuscat
Copy link

@steveh thanks.

@apparentlymart too bad! Thank you for the pointers, I'll play a bit with terraform and see if I could contribute something like this, but I doubt it could be in the short term, even go is new to me...

Also how would you approach the implementation of the "Applications" resource?

@apparentlymart
Copy link
Contributor Author

@brutuscat both Applications and Instances shouldn't be too hard to add either... they are just resources with a more complicated schema, IIRC. More complex schema just means more work marshaling/unmarshaling from the underlying AWS API objects, and more documentation work! 😀

I think Deployments are the more interesting case. They are really actions rather than resources, and so I was thinking about making a provisioner for executing opsworks deployment actions (not only deployments, but also "Execute Chef Recipes", etc); this was what inspired me to open #2756.

@catsby
Copy link
Contributor

catsby commented Nov 20, 2015

Hey Friends – we're working on a release and OpsWorks is giving me some trouble 😕

// These tests assume the existence of predefined Opsworks IAM roles named `aws-opsworks-ec2-role`
// and `aws-opsworks-service-role`.

There isn't much information to go on here. I managed to get 2 of the tests to pass (TestAccAwsOpsworksCustomLayer and TestAccAwsOpsworksStackNoVpc) by pre-creating these resources with this config:

resource "aws_iam_instance_profile" "test_profile_ec2" {
    name = "aws-opsworks-ec2-role"
    roles = ["${aws_iam_role.role_ec2.name}"]
}

resource "aws_iam_role" "role_ec2" {
    name = "aws-opsworks-ec2-role"
    path = "/"
    assume_role_policy = <<EOF
{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Principal": {"AWS": "*"},
            "Effect": "Allow",
            "Sid": ""
        }
    ]
}
EOF
}

resource "aws_iam_instance_profile" "test_profile_service" {
    name = "aws-opsworks-service-role"
    roles = ["${aws_iam_role.role_service.name}"]
}

resource "aws_iam_role" "role_service" {
    name = "aws-opsworks-service-role"
    path = "/"
    assume_role_policy = <<EOF
{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Principal": {"AWS": "*"},
            "Effect": "Allow",
            "Sid": ""
        }
    ]
}
EOF
}

However, TestAccAwsOpsworksStackVpc failed after 913 seconds (most of the time waiting for IAM to propagate, which seems ... not correct):

--- FAIL: TestAccAwsOpsworksStackVpc (913.54s)
    resource_aws_opsworks_stack_test.go:348: [DEBUG] ServiceRoleARN for OpsWorks: arn:aws:iam::470663696735:role/aws-opsworks-service-role
    resource_aws_opsworks_stack_test.go:349: [DEBUG] Instance Profile ARN for OpsWorks: arn:aws:iam::470663696735:instance-profile/aws-opsworks-ec2-role
    testing.go:137: Step 0 error: Error applying: 1 error(s) occurred:

        * aws_opsworks_stack.tf-acc: ValidationException: Service Role Arn: Credential has not the necessary trust relationship
            status code: 400, request id: f3d07b4f-8f9a-11e5-9fd8-5b601e5fe042
FAIL
exit status 1

Can these tests be improved, so I can run them with ease?

@mrjefftang
Copy link
Contributor

I've had this issue using Terraform to do Opsworks.

I believe it has to do with the Service Role ARN not being fully created before the stack is created. If I wait ~5 minutes and just rerun Terraform, it works just fine because the service role ARN already exists.

@apparentlymart
Copy link
Contributor Author

@catsby hmm interesting... these roles already existed in my account, I guess because I'd been manually creating opsworks resources via the UI first. I therefore assumed that they'd come into existence when working with opsworks in Terraform too.

Since Opsworks made these things automatically for me I never put much thought into what was inside them but I'll see if I can dig them out of my AWS account and see what the right settings are.

Making these things with terraform would presumably make the tests fail for anyone who had the roles already created though, and that doesn't seem ideal either. :(

@mrjefftang
Copy link
Contributor

Just realized the error is much different than what I encountered:

@catsby This is what I use which is copied from the default Opsworks profiles that are created:

resource "aws_iam_role" "opsworks_service" {
    name = "iam-opswork-service"
    assume_role_policy = "${file("iam/role-opsworks.json")}"
}

resource "aws_iam_role_policy" "opsworks_service" {
    name = "iam-opsworks-service-policy"
    role = "${aws_iam_role.opsworks_service.id}"
    policy = "${file("iam/policy-opsworks.json")}"
}

resource "aws_iam_role" "opsworks_ec2" {
    name = "iam-opsworks-ec2"
    assume_role_policy = "${file("iam/role-ec2.json")}"
}

resource "aws_iam_instance_profile" "opsworks_ec2" {
    name = "iam-opsworks-ec2-profile"
    roles = ["${aws_iam_role.opsworks_ec2.name}"]
}

policy_opsworks.json

{
  "Statement": [
    {
      "Action": [
        "ec2:*",
        "iam:PassRole",
        "cloudwatch:GetMetricStatistics",
        "cloudwatch:DescribeAlarms",
        "ecs:*",
        "elasticloadbalancing:*",
        "rds:*"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ]
    }
  ]
}

role-opsworks.json

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "opsworks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

role-ec2.json

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

@apparentlymart
Copy link
Contributor Author

opsworks-ec2-role has the following policy:

// CloudWatchFullAccess-aws-opsworks-ec2-role
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "autoscaling:Describe*",
        "cloudwatch:*",
        "logs:*",
        "sns:*"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

opsworks-service-role has two inline policies:

// aws-opsworks-service-policy
{"Statement": [{"Action": ["ec2:*", "iam:PassRole",
                "cloudwatch:GetMetricStatistics",
                "elasticloadbalancing:*",
                "rds:*"],
                "Effect": "Allow",
                "Resource": ["*"] }]}
// ec2DescribeInstancesAllow
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1422496252000",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

Like @mrjefftang said, I expect these roles to get created as part of each account's first interactions with OpsWorks, rather than needing to be manually created. I wonder if it'd be sufficient just to document in a comment that the first time you run the tests you might see them fail with that error and you'll need to wait a few minutes for the IAM stuff to propagate before running again? I know that's a pretty awful UX but that's the UX of the API we're wrapping here. 😦

@apparentlymart
Copy link
Contributor Author

Digging some more I found an article about this.

It seems to imply (but does not say explicitly) that these "default" roles come from working in the console, not via the API. So maybe we can work around this by creating some differently-named roles with the same policy inside the acceptance tests, and associating those. Then we won't trample on the names that the Opsworks console uses by default.

@apparentlymart
Copy link
Contributor Author

Okay I deleted all of the Opsworks stuff from my test AWS account, including those roles. When I ran the acceptance tests, it failed just as it presumably did for @catsby:

--- FAIL: TestAccAwsOpsworksStackVpc (11.33s)
    testing.go:137: Step 0 error: Check failed: NoSuchEntity: The role with name aws-opsworks-service-role cannot be found.
            status code: 404, request id: xxx
    testing.go:137: Step 0 error: Error applying: 1 error(s) occurred:

        * aws_opsworks_stack.tf-acc: ValidationException: Please provide a Service Role Arn and a region
            status code: 400, request id: xxx

When the first stack created comes from the API it looks like these roles do not get created, as those docs implied. Only if you create your first stack from the UI is that the default.

So seems like there's a couple options here:

  • The acceptance tests could each create a set of roles and instance profiles with different names than the defaults, use them for testing and then delete them again.
  • The comment in the source code could be extended to simply suggest that those running the tests go create a new Opsworks stack in the AWS console and let it auto-create the default objects before running the tests.

The former is the more robust solution, but the latter is a quick fix that might just unblock the release.

@catsby
Copy link
Contributor

catsby commented Nov 20, 2015

So maybe we can work around this by creating some differently-named roles with the same policy inside the acceptance tests, and associating those.

This seems like all we need. To get by for 2 tests, I created the roles I mentioned above, with custom_instance_profile_arn or something, yeah? Just need the proper trust policy?

@catsby
Copy link
Contributor

catsby commented Nov 20, 2015

@ghost
Copy link

ghost commented Apr 29, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants