Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create unified aws auth backend #2441

Merged
merged 25 commits into from
Apr 24, 2017

Conversation

joelthompson
Copy link
Contributor

This reimplements the strategy proposed in #1962 but unifies the aws-ec2 and aws-iam authentication backends into a single backed.

The aws-ec2 authentication backend is being expanded and will become the
generic aws backend. This is a small rename commit to keep the commit
history clean.
This adds the ability to authenticate arbitrary AWS IAM principals using
AWS's sts:GetCallerIdentity method. The AWS-EC2 auth backend is being to
just AWS with the expansion.
This was omitted from the previous commit
Also fixed a bug where allowed auth types weren't being checked upon
login, and added tests for it.
Intent is to override the AWS environment variables with the TEST_*
versions if they are set, but the reverse was happening.
AWS now allows you to change the instance profile of a running instance,
so the use case of "a long-lived instance that's not in an instance
profile" no longer means you have to use the the EC2 auth method. You
can now just change the instance profile on the fly.
@mfischer-zd
Copy link
Contributor

If/when this makes it to a release, drinks are on me!

@joelthompson
Copy link
Contributor Author

Also, I left the documentation page named aws-ec2.html in order to preserve old links. Let me know if there's a better way to handle this!


func (h *CLIHandler) Help() string {
help := `
The AWS credentaial provider allows you to authenticate with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: credential

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, thanks. Fixed.

@jefferai jefferai added this to the 0.7.1 milestone Mar 22, 2017
Copy link
Member

@jefferai jefferai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole I think this is looking really good.

Rather than populate the backends with both aws and aws-ec2 backends, please use an alias in vault/auth.go. We want to be backwards-compatible but avoid introducing inconsistencies going forward.

@@ -34,8 +34,12 @@ func (b *backend) getClientConfig(s logical.Storage, region string) (*aws.Config
endpoint := aws.String("")
if config != nil {
// Override the default endpoint with the configured endpoint.
if config.Endpoint != "" {
if clientType == "ec2" && config.Endpoint != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if-else chain should be a switch on clientType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -20,6 +20,12 @@ func pathRole(b *backend) *framework.Path {
Type: framework.TypeString,
Description: "Name of the role.",
},
"allowed_auth_types": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to make this ec2,iam by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to maintain backwards compatibility with respect to the security properties of any Vault setup scripts. If this defaults to ec2,iam then any pre-existing Vault setup scripts that mount the aws-ec2 auth backend, when run against a new Vault cluster with this feature, would now allow the iam auth_type as well.

It's a fairly subtle point, and without knowing your preferences on this type of backwards compatibility, I figured it would be better to err on the side of being a little extra paranoid, especially as it relates to maintaining Vault's existing security posture. I'd be happy to change it to ec2,iam if you're not worried about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually in this type of situation, we put in logic so that existing roles will maintain the current status quo and new roles will get the new behavior.

An easy way to do this would be, on role read of an existing role, to check if allowed_auth_types is set at all (since it's a new variable), and if so, set it to ec2, but set the default for new roles to ec2,iam. This will work since any new roles won't successfully load an existing entry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out on the backwards compatibility, I think I've added the right things to preserve that (but please double check anywhere else I might have missed!).

However, I think I might be saying something different than you are here. I'm talking about preserving the backwards compatibility of scripts that do things like:

vault auth-enable aws-ec2
vault write auth/aws-ec2/role/myrole bound_ami_id=ami-xxxxxxx

I'm trying to avoid changing the behavior of scripts like those that aren't aware of the IAM auth method. If we make the default ec2,iam and further don't default to inferencing, then the write I just mentioned will fail because there's no binding to something the iam auth_method can understand.

Again, I'm mostly making assumptions about what sort of backwards compatibility you want to support and trying to fit that. If you're OK changing that level of backwards compatibility given the implications, then I'm happy to change the code to support that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question and it's not a super straightforward answer. In the past the philosophy is "a new Vault installation should work according to current best practices", so what we've done is used upgrade paths to not modify behavior of existing roles but have new roles gain new behavior (or roles that are then updated), and then we document this change in the upgrade guide.

I think the IAM functionality will be super useful, and I think having it be available by default is the nicest option from a user perspective. We do publish upgrade guides for a reason. So my gut says we should add an upgrade path (which I think would be, type is ec2 and inference is off since if iam isn't enabled binds are always checked?) and default it to ec2,iam (with inference checking on?)

I'll ask the team too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a question of which path to follow when both types are supported and all the required parameters are set. I am inclined towards making this a value that takes in just a single value. Then a role will allow just one flow and it will be easier to reason about what happens when. That way, this can default to "ec2" to retain old behavior and have operators explicitly set "iam" when needed. Unfortunately, in that case, we can't default to "iam".

},
"role_inferred_type": {
Type: framework.TypeString,
Default: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean default for a string value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"role_inferred_type": {
Type: framework.TypeString,
Default: false,
Description: `When auth_type is iam, the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to default this to ec2_instance? The documentation makes it clear what the caveats are, but generally speaking this seems useful. Also I may be missing something, but the danger here (AFAICT) is that someone may spoof an EC2 instance, but that the credentials used to do the spoofing have to be allowed on the role in the first place. Since the purpose of doing the inferencing is further restrictions (via binds), it doesn't seem like the danger is that great. Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. When this is set, it turns on inferencing, and when inferencing is turned on, the client must look like a valid EC2 instance; otherwise, authentication will fail. So defaulting to ec2_instance would give you a default configuration that, for example, wouldn't let Lambdas authenticate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like it'd be nice for there to be some way to say "use inferencing opportunistically" so you can reuse the same role for EC2 (with extra restrictions) and IAM. I may well be missing a good reason not to do that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just not seeing the use case for doing that. If I'm missing something, please inform me!

The way I see it, inferencing basically means, "These additional conditions (i.e., binds) are important to me, so enforce them." Which implies, "If I can't test the conditions, I should just fail" (to be safe). There's simply too much gray area in being "opportunistic" for me to be comfortable with. There are two main ambiguities off the top of my head that I am concerned with:

  1. Session ID "looks like" an EC2 instance ID:
  2. Session ID matches an EC2 instance that Vault is able to discover: Probably OK to do inferencing, but requires a request to the AWS APIs to query instance IDs
  3. Session ID doesn't match an EC2 instance that Vault is able to discover. Should we assume alternate roles in different accounts to search for that instance ID? If so, how do we determine the "target" account? If not, do we just ignore the inferencing (and associated bindings)?
  4. Session ID doesn't "look like" an EC2 instance ID: probably just ignore bindings completely

Note that Vault would have to keep up to date with what EC2 instance IDs look like.

Anyway, in my mind, the the use of inferencing means "this is important to me and therefore you must apply it and fail if you can't" as opposed to "this seems useful to me and you should apply it if you can and fail open if you can't." If inferencing is requested for a role but can't be used, then I think it's better to fail closed than open.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you're saying; I wasn't really aware that there might be ambiguities with the instance ID. That's fine, let's leave as-is.

roleEntry.InferredAWSRegion = inferredAWSRegionRaw.(string)
}

allowEc2Auth, allowIamAuth := false, false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just declare variables, 0-type for booleans is false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

aws_security_token=<security_token>
```

For reference, the following Go program also demonstrates how to generate the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than dump a whole Go program here, just link to the CLI code, which contains the same bits but is more complete (adding more commenting into the CLI code if necessary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me ask a stupid question: did you forget to push something? Because a few places I've noticed things you marked as done but the github diff still is showing me the old bits. I've verified that changes from all commits is what's supposed to be showing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, sorry about that. I remember making these changes, but somehow I lost them. I've pushed these changes now, please let me know if there's anything else missing.

@@ -36,23 +42,58 @@ bearing the name of the AMI ID of the EC2 instance that is trying to login.
If a matching role is not found, login fails.`,
},

"auth_type": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be entirely inferred by the parameters that are supplied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though it requires additional validations to ensure the inference is valid. I've added the code that should do that, let me know what you think of it.

func (b *backend) pathLoginUpdate(
req *logical.Request, data *framework.FieldData) (*logical.Response, error) {
auth_type := data.Get("auth_type")
if auth_type == "ec2" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what I'm going to say here :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... :) Should be fixed.


// It's a bit of a hack to use the inferred "EC2" region to get a region for the IAM client
// IAM is a "global" service and so doesn't really have a region, so it wouldn't matter
// Except that the region is used to infer the partion (i.e., aws, aws-cn, or aws-us-gov),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/partion/partition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -907,11 +1414,21 @@ type roleTagLoginResponse struct {
DisallowReauthentication bool `json:"disallow_reauthentication" structs:"disallow_reauthentication" mapstructure:"disallow_reauthentication"`
}

const magicVaultHeader = "X-Vault-AWSIAM-Server-Id"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this something like serverIDHeader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling it iamServerIdHeader

Copy link
Contributor Author

@joelthompson joelthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

I think I've responded to everything and implemented the alias in vault/auth.go as you suggested. Let me know if I've missed anything!

@@ -34,8 +34,12 @@ func (b *backend) getClientConfig(s logical.Storage, region string) (*aws.Config
endpoint := aws.String("")
if config != nil {
// Override the default endpoint with the configured endpoint.
if config.Endpoint != "" {
if clientType == "ec2" && config.Endpoint != "" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -20,6 +20,12 @@ func pathRole(b *backend) *framework.Path {
Type: framework.TypeString,
Description: "Name of the role.",
},
"allowed_auth_types": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to maintain backwards compatibility with respect to the security properties of any Vault setup scripts. If this defaults to ec2,iam then any pre-existing Vault setup scripts that mount the aws-ec2 auth backend, when run against a new Vault cluster with this feature, would now allow the iam auth_type as well.

It's a fairly subtle point, and without knowing your preferences on this type of backwards compatibility, I figured it would be better to err on the side of being a little extra paranoid, especially as it relates to maintaining Vault's existing security posture. I'd be happy to change it to ec2,iam if you're not worried about this.

},
"role_inferred_type": {
Type: framework.TypeString,
Default: false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"role_inferred_type": {
Type: framework.TypeString,
Default: false,
Description: `When auth_type is iam, the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. When this is set, it turns on inferencing, and when inferencing is turned on, the client must look like a valid EC2 instance; otherwise, authentication will fail. So defaulting to ec2_instance would give you a default configuration that, for example, wouldn't let Lambdas authenticate.

roleEntry.InferredAWSRegion = inferredAWSRegionRaw.(string)
}

allowEc2Auth, allowIamAuth := false, false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Description: "URL to override the default generated endpoint for making AWS STS API calls.",
},

"iam_auth_header_value": &framework.FieldSchema{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called that because I'm bad at naming variables :)

changed.

@@ -36,23 +42,58 @@ bearing the name of the AMI ID of the EC2 instance that is trying to login.
If a matching role is not found, login fails.`,
},

"auth_type": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though it requires additional validations to ensure the inference is valid. I've added the code that should do that, let me know what you think of it.

func (b *backend) pathLoginUpdate(
req *logical.Request, data *framework.FieldData) (*logical.Response, error) {
auth_type := data.Get("auth_type")
if auth_type == "ec2" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... :) Should be fixed.


// It's a bit of a hack to use the inferred "EC2" region to get a region for the IAM client
// IAM is a "global" service and so doesn't really have a region, so it wouldn't matter
// Except that the region is used to infer the partion (i.e., aws, aws-cn, or aws-us-gov),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -907,11 +1414,21 @@ type roleTagLoginResponse struct {
DisallowReauthentication bool `json:"disallow_reauthentication" structs:"disallow_reauthentication" mapstructure:"disallow_reauthentication"`
}

const magicVaultHeader = "X-Vault-AWSIAM-Server-Id"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling it iamServerIdHeader

@joelthompson joelthompson force-pushed the feature/unified_aws_auth branch from fdb22c8 to 5c82f59 Compare March 29, 2017 04:01

return secret.Auth.ClientToken, nil

return "", nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left out by mistake I guess :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, thanks. Fixed.

return nil, fmt.Errorf("could not configure STS client")
}

config, err := b.getClientConfig(s, region, clientType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think stsConfig and config are redundant.

Actually now that getClientConfig takes in a clientType, it might probably be a good idea to remove getStsClientConfig altogether and do all the type specific things in a single function keyed off of a switch. We now have two methods that take in a type and one of those kind of handles both iam and sts but only has latter in its name, and that we need to pass in type for both the methods. Feels very weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they're redundant, but I do think that getStsClientConfig is confusingly named. getStsClientConfig doesn't mean "get configurations for an STS client;" rather, it means, "Use sts:AssumeRole to get credentials and then build a client using those credentials." It's a wrapper around and a layer of abstraction on top of getClientConfig and so I think they should remain separate methods. I could see the client interface changing and everyone just calling getStsClientConfig (but it would be renamed something more appropriate), and this method would do the check to see if stsRole was passed in as a non-empty string and make the appropriate calls to getClientConfig (also renamed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you are saying and it all makes sense. My only concern is around how getClientConfig looks as of now. It takes in a clientType and if a "sts" is passed in to that function, I'd expect a stsConfig returned but that doesn't happen! What I am suggesting is that do the extra layer of abstraction in the same function based on clientType.

Also, can you help me understand why we need both config and stsConfig in this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I refactored a little along these lines. I was trying to avoid doing so because this factorization basically orthogonal to the change I was making and the PR is already very large, so I was trying to keep it smaller, but it seems like it's easier than continuing to discuss. (The main reason I made changes in here is because the existing config for specifying an Endpoint was obviously buggy as it would apply the same endpoint to ec2 and iam clients rather than specifying alternate endpoints, and I didn't want to perpetuate that with the sts client).

stsEndpointStr, ok := data.GetOk("sts_endpoint")
if ok {
if configEntry.STSEndpoint != stsEndpointStr.(string) {
// NOT setting changedCreds here, since this isn't really cached
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not cache the creds here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, spent a bit more time thinking through this -- I think they actually are cached (just indirectly), so I'm adding in the cache flushing. I explain it a bit better in code comments.

Endpoint string `json:"endpoint" structs:"endpoint" mapstructure:"endpoint"`
IAMEndpoint string `json:"iam_endpoint" structs:"iam_endpoint" mapstructure:"iam_endpoint"`
STSEndpoint string `json:"sts_endpoint" structs:"sts_endpoint" mapstructure:"sts_endpoint"`
HeaderValue string `json:"vault_header_value" structs:"vault_header_value" mapstructure:"vault_header_value"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HeaderValue vs iam_server_id_header_value vs vault_header_value is pretty inconsistent. Can we use just one in all cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

significance.`,
},

"request_method": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming this iam_http_request_method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Not the biggest fan because it's potentially confusing (this is actually request method of the STS GetCallerIdentity HTTP request, but it's for the IAM auth method), but done.

return transformedArn, principalName, sessionName, nil
}

func ensureVaultHeaderValue(headers http.Header, requestUrl *url.URL, requiredHeaderValue string) (bool, string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ensure/validate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return transformedArn, principalName, sessionName, nil
}

func ensureVaultHeaderValue(headers http.Header, requestUrl *url.URL, requiredHeaderValue string) (bool, string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach of returning a string error message seems a little off. We could return an error and use error.Error() to get the message from the calling function.

Also, regarding the casing of messages, we generally try to start the logical.ErrorResponses with an upper case and all the fmt.Errorf()s with a lower case character.

re := regexp.MustCompile(".*SignedHeaders=([^,]+)")
authzHeader := strings.Join(authzHeaders, ",")
matches := re.FindSubmatch([]byte(authzHeader))
if len(matches) < 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the length ever be < 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an invalidly signed AWS request. In that case, I guess AWS would probably just reject it, but I'd prefer to be safer here as well.

if len(matches) > 2 {
return false, "found multiple SignedHeaders components"
}
signedHeaders := string(matches[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, lack of state here. How do we know that the desired value is at index 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the contract of FindSubmatch. matches[0] is the entire string assuming it matched, matches[1] is the portion of the string that I'm regex'ing out inside the parentheses (since that's the only submatch I'm defining).

return transformedArn, principalName, sessionName, nil
}

func ensureVaultHeaderValue(headers http.Header, requestUrl *url.URL, requiredHeaderValue string) (bool, string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do away with the boolean return variable and have this method just return an errorA nil value can indicate that the validation was successful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (I think)

}
canonicalArn, principalName, sessionName, err := parseIamArn(clientArn)
if err != nil {
return logical.ErrorResponse("unrecognized IAM principal type"), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect error statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Returns whether the EC2 instance meets the requirements of the particular
// AWS role entry.
func (b *backend) verifyInstanceMeetsRoleRequirements(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second return argument can be of error type as well. The caller should do error.Error() to get the message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"canonical_arn": canonicalArn,
"auth_type": "iam",
"role_tag_max_ttl": rTagMaxTTL.String(),
"inferredEntityType": inferredEntityType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change these to inferred_entity_type and inferred_entity_id to be consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Metadata: map[string]string{
"client_arn": clientArn,
"canonical_arn": canonicalArn,
"auth_type": "iam",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably its a good idea to have constants for the auth types and client types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

(hasRequestMethod || hasRequestUrl || hasRequestBody || hasRequestHeaders)
}

func parseIamArn(iamArn string) (string, string, string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some comments in this code to help understand the magic indices being used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -907,11 +1435,21 @@ type roleTagLoginResponse struct {
DisallowReauthentication bool `json:"disallow_reauthentication" structs:"disallow_reauthentication" mapstructure:"disallow_reauthentication"`
}

const iamServerIdHeader = "X-Vault-AWSIAM-Server-Id"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/X-Vault-AWSIAM-Server-Id/X-Vault-AWS-IAM-Server-ID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

switch {
case !allowIamAuth:
return logical.ErrorResponse("specified role_inferred_type but didn't allow iam auth_type"), nil
case roleEntry.RoleInferredType != "ec2_instance":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a constant for ec2_instance? Its being used in a lot of places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

(as though it were a glob ending in '*'). This is only checked when
auth_type is ec2.`,
},
"role_inferred_type": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this inferred_role_type in tandem with inferred_aws_region?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like inferred_role_type as it seems like you're inferring something about the type of role, not the entity that the role is authorizing. I changed it to inferred_entity_type.


var allowEc2Auth, allowIamAuth bool

parseAllowedAuthTypes := func(input string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its very confusing as to what the return type is. Can we make this an error here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// new bindings requested are still valid bindings. For example, let's say a role is created
// with an auth_type of iam and no inferencing is configured; then, in a subsequent role update,
// bound_ami_id is specified. This is how that edge case is caught.
for _, t := range roleEntry.AllowedAuthTypes {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a possibility of only doing this once. Basically, retain this block as-is and change the above code as follows:

 if allowedAuthTypesRaw, ok := data.GetOk("allowed_auth_types"); ok {
		roleEntry.AllowedAuthTypes = strutil.ParseDedupAndSortStrings(allowedAuthTypesRaw.(string), ",")
	} else if req.Operation == logical.CreateOperation {
		roleEntry.AllowedAuthTypes = strutil.ParseDedupAndSortStrings(data.Get("allowed_auth_types").(string), ",")
	}

Let me know if it doesn't address all the use cases you intended to address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner, thanks! The entries are still being parsed twice, but I think it's much more elegant.

Copy link
Contributor Author

@joelthompson joelthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I've responded to all the comments, let me know if I've missed anything.


return secret.Auth.ClientToken, nil

return "", nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, thanks. Fixed.

return nil, fmt.Errorf("could not configure STS client")
}

config, err := b.getClientConfig(s, region, clientType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they're redundant, but I do think that getStsClientConfig is confusingly named. getStsClientConfig doesn't mean "get configurations for an STS client;" rather, it means, "Use sts:AssumeRole to get credentials and then build a client using those credentials." It's a wrapper around and a layer of abstraction on top of getClientConfig and so I think they should remain separate methods. I could see the client interface changing and everyone just calling getStsClientConfig (but it would be renamed something more appropriate), and this method would do the check to see if stsRole was passed in as a non-empty string and make the appropriate calls to getClientConfig (also renamed).

stsEndpointStr, ok := data.GetOk("sts_endpoint")
if ok {
if configEntry.STSEndpoint != stsEndpointStr.(string) {
// NOT setting changedCreds here, since this isn't really cached
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, spent a bit more time thinking through this -- I think they actually are cached (just indirectly), so I'm adding in the cache flushing. I explain it a bit better in code comments.

Endpoint string `json:"endpoint" structs:"endpoint" mapstructure:"endpoint"`
IAMEndpoint string `json:"iam_endpoint" structs:"iam_endpoint" mapstructure:"iam_endpoint"`
STSEndpoint string `json:"sts_endpoint" structs:"sts_endpoint" mapstructure:"sts_endpoint"`
HeaderValue string `json:"vault_header_value" structs:"vault_header_value" mapstructure:"vault_header_value"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

significance.`,
},

"request_method": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Not the biggest fan because it's potentially confusing (this is actually request method of the STS GetCallerIdentity HTTP request, but it's for the IAM auth method), but done.

@@ -907,11 +1435,21 @@ type roleTagLoginResponse struct {
DisallowReauthentication bool `json:"disallow_reauthentication" structs:"disallow_reauthentication" mapstructure:"disallow_reauthentication"`
}

const iamServerIdHeader = "X-Vault-AWSIAM-Server-Id"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

(as though it were a glob ending in '*'). This is only checked when
auth_type is ec2.`,
},
"role_inferred_type": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like inferred_role_type as it seems like you're inferring something about the type of role, not the entity that the role is authorizing. I changed it to inferred_entity_type.


var allowEc2Auth, allowIamAuth bool

parseAllowedAuthTypes := func(input string) string {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

switch {
case !allowIamAuth:
return logical.ErrorResponse("specified role_inferred_type but didn't allow iam auth_type"), nil
case roleEntry.RoleInferredType != "ec2_instance":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// new bindings requested are still valid bindings. For example, let's say a role is created
// with an auth_type of iam and no inferencing is configured; then, in a subsequent role update,
// bound_ami_id is specified. This is how that edge case is caught.
for _, t := range roleEntry.AllowedAuthTypes {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner, thanks! The entries are still being parsed twice, but I think it's much more elegant.

BoundSubnetID string `json:"bound_subnet_id" structs:"bound_subnet_id" mapstructure:"bound_subnet_id"`
BoundVpcID string `json:"bound_vpc_id" structs:"bound_vpc_id" mapstructure:"bound_vpc_id"`
RoleInferredType string `json:"infer_role_type" structs:"infer_role_type" mapstructure:"infer_role_type"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the variable names and the struct tags to InferredEntityType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


iamClient, err := b.clientIAM(s, identityDoc.Region, stsRole)
if err != nil {
iamClient = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are setting iamClient to nil here as it is uncertain at this point to know if we'll need the iamClient object and an error should not stop the flow. In that sense, its okay to have iamClient set to nil (which I see that you are checking later on). But its not okay to ignore a legitimate error.

Can we error out if err != nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about why I'm setting iamClient to nil Though I disagree with your statement that it's not OK to ignore a legitimate error -- mostly I'm questioning the assumption that an error here is actually legit if we didn't even need to call the code in the first place.

Anyway, I think I've got a solution that will satisfy both of us :)

if validationError != "" {
return logical.ErrorResponse(validationError), nil
if validationError != nil {
return logical.ErrorResponse(fmt.Sprintf("Error validating instance: %s", validationError)), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use %v for errors?

// 1. arn:aws:iam::<account_id>:user/<UserName>
// 2. arn:aws:sts::<account_id>:assumed-role/<RoleName>/<RoleSessionName>
// if we get something like 2, then we want to transform that back to what
// most people would expect, which is arn;aws:iam::<account_id>:role/<RoleName>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/;/:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// good enough rather than having to muck around in the low-level details
for _, envvar := range []string{
"AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "AWS_SECURITY_TOKEN", "AWS_SESSION_TOKEN"} {
os.Setenv(envvar, os.Getenv("TEST_"+envvar))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has an obvious drawback of losing valid AWS env creds. Can we store the original values before modifying and restore them when the test is complete? or probably as a deferred action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I've done the right thing here (though I'm not confident enough in Go to say I've done the right thing). Let me know!

return nil, fmt.Errorf("could not configure STS client")
}

config, err := b.getClientConfig(s, region, clientType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you are saying and it all makes sense. My only concern is around how getClientConfig looks as of now. It takes in a clientType and if a "sts" is passed in to that function, I'd expect a stsConfig returned but that doesn't happen! What I am suggesting is that do the extra layer of abstraction in the same function based on clientType.

Also, can you help me understand why we need both config and stsConfig in this function?

presigned request. Currently, POST is the only supported value`,
},

"iam_request_url": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if even this should be base64 encoded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I can't think of any concrete examples today that would require this to be base64 encoded, requiring base64 encoding seemed safer and more future proof (e.g., to use cases neither of us has thought about yet) with very minimal down side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this slip by mistake or did you have any other thoughts for not base64-ing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, misread your original comment and thought you were saying to NOT base64-encode this (and thought that I had). Agreed.

}
}
if config.STSEndpoint != "" {
endpoint = config.Endpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be STSEndpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, yep. done.

} else if req.Operation == logical.CreateOperation {
roleEntry.AllowedAuthTypes = strutil.ParseDedupAndSortStrings(data.Get("allowed_auth_types").(string), ",")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen when RoleInferredType is set and AllowedAuthTypes does not contain "ec2"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand your question.

AllowedAuthTypes is solely about what kind of authentication is allowed, i.e., what kind of material is sufficient to establish an identity. If it's ec2 then the instance identity document; if it's iam then a signed GetCallerIdentity request.

RoleInferredType deals solely with authorization when using an iam auth type.

So, when RoleInferredType is set and AllowedAuthTypes doesn't contain ec2 then the desired behavior is that the only thing that should be allowed to authenticate is an IAM principal using a signed GetCallerIdentity request in which the RoleSessionName is a valid EC2 instance ID, and we apply EC2-specific binds to the instance given by the instance ID.

What am I missing here? Does the code not behave as I just described? Is the documentation confusing? Do you disagree with the desired behavior? Something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below: #2441 (comment)

return logical.ErrorResponse(fmt.Sprintf("Error validating instance: %s", validationError)), nil
}

if roleTagResp != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are ignoring the disallow_reauthentication value returned in role tag response.

I am getting confused as to how all this will fit in. For example, "auth_type" is "iam", "inferred_entity_type" is "ec2_instance", and role tag has set disallow_reauthentication.

When we login using the inferred type, we don't update the storedIdentity entry in the whitelist at all, which is not okay.

I am inclined towards thinking that the inference of ec2_instance should only mean that we could apply the bounds that are applicable to an ec2 instance and nothing else. Like, we shouldn't allow role tag login from this code flow. Either handle all the cases of an ec2 login flow or not intervene.

@jefferai @briankassouf What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; I view disallow_reauthentication as something that should only apply to the ec2 authentication method (given its relatively static identity material, the instance identity document). And so, similarly, I view the storedIdentity as a construct for protecting against reply attacks of a discovered instance identity document that wouldn't need to be replicated when using the iam auth method given the rotating nature of those credentials.

RE: "I am inclined towards thinking that the inference of ec2_instance should only mean that we could apply the bounds that are applicable to an ec2 instance and nothing else." I largely agree with that, and that has been how I've tried to write this. But, RE: "We shouldn't allow role tag login from this code flow" I don't understand what you're saying. The use of inferencing means we've already decided to assume that the authenticated entity is an EC2 instance. And because role tags apply to EC2 instances, they should likewise apply to inferred EC2 instances. FWIW, I view role tagging as an authorization mechanism rather than an authentication mechanism (please correct me if I'm wrong!), and the code flow makes sure it is applicable only to EC2 instances.

Anyway, if I'm misunderstanding anything you're saying, please let me know!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are saying and it makes sense too. However, I still have some concerns.

Yes, role tags do the job of authorization, but it it tightly coupled with the ec2 auth mechanism. It has a nonce, disable_reauthentication and allow_instance_migration fields which are only applicable when authentication is done via identity document. Even if we come up with a way to create role tags based on auth type, by nature, role tags are immutable and are not even stored at the backend and hence there can't be any upgrade paths for existing role tags in terms of being able to distinguish them from belonging to ec2 and/or iam roles.

For those who never use ec2 auth mechanism in future, the fields in role tag might be hard to reason about. Role tags are designed to contain items that restrict the capabilities of the role. If the role has a bound by IAM principal ARN and when the authentication is not performed using identity document, the role tag fields does not represent the subset of capabilities on the role.

I think there are two ways to go forward:

  1. When the role is bound by IAM principal ARN, make sure that role tag does not contain items that are relevant to the identity document.
  2. When the role is bound by IAM principal ARN, don't allow role tags to be created at all.

I am inclined towards the 2nd option though. Further in this direction, to be able to reason about what's going on at any given time, I feel that the role should be tied to any one auth type. For example, when the auth types contain both iam and ec2, which credentials do we prioritize if both identity document and the new ones are supplied? It also gets harder to determine when we should allow role tags and when not.

Here are my thoughts on what we should do. Let me know what you think.

  1. One role should allow only one type of authentication, ec2 or iam.
  2. When the type is ec2, everything remains the same as it was before.
  3. When the type is iam, accept new credentials. If inferred type is ec2_instance, apply the binds belonging to ec2 instances but not those that are specific to identity document. Specifically keep the role tag processing out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to butt in here for a second regarding Vishal's three thoughts at the end of his comment. He and I had discussed this yesterday and it seemed like that might be the path forward, except just today I had a customer call with someone who is awaiting this work and looking forward to the inferencing. :-S So I think we need to keep inferencing, but just figure out the right answers to some hard questions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, but I'm struggling to fully grok what is being suggested here. Just for my own sanity, is the crux of the issue here that role tags contain fields specific to both instances and the auth type used by instances?

@vishalnayak -- your number 2 and 3 at the end were pretty much exactly as I was holding it; however, the way I saw it, role tags are things that apply to instances; they're not specific to the instance identity document. Therefore, I tried to keep most of the things in the role tag that made sense to apply (disallow_reauthentication and allow_instance_migration being the two that I took out).

My view is that how I've done it provides a smooth transition path for customers from the ec2 auth method to the iam auth method. They can enable iam auth on the role and configure inferencing, and then gradually transition clients over to use iam and then turn off ec2 once they've migrated. To facilitate this migration, I would want the behavior to be as close as possible between the ec2 and the iam auth methods, which means I wouldn't want customers surprised to find out that their instances suddenly were able to assume a broader set of policies than they intended because we started ignoring role tags. Keeping role tags should also have the side benefit of cutting down on role sprawl inside of Vault. Of course, this migration path I mentioned would require the ability to have ec2 and iam enabled at the same time.

For those who never use ec2 auth mechanism in future, the fields in role tag might be hard to reason about.

I completely agree. The same holds true with other bindings in the role, and I think that's the hardest part to reason about; role tags seem much easier to me. That's why I originally suggested having these as two entirely different auth backends :)

If the role has a bound by IAM principal ARN and when the authentication is not performed using identity document, the role tag fields does not represent the subset of capabilities on the role.

I would say that it partially represents the subset of the capabilities on the role (i.e., binding by individual instance ID, specific policies, or max_ttl).

2 .When the role is bound by IAM principal ARN, don't allow role tags to be created at all.

My concern here is about people changing allowed auth types on the role. Do we just make that immutable on the role? If we don't make it immutable, then do we just delete all the role tags on the role when switching from ec2 to iam? It also makes the transition path more difficult for customers, as they'll potentially have to map one role with role tags specifying policies to many roles with different policies mapped for each role.

For example, when the auth types contain both iam and ec2, which credentials do we prioritize if both identity document and the new ones are supplied?

We could just set an auth parameter of auth_type and ask customers ;) On an actual note, I think we should just error out in that scenario.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelthompson

Just for my own sanity, is the crux of the issue here that role tags contain fields specific to both instances and the auth type used by instances?

The issue is that role tags are supposed to represent a strict subset of the capabilities on the role, and not partial. Hence, we can't choose to keep certain things out, like, disallow_reauthentication and allow_instance_migration.

Regarding the migration path, since we are thinking about making a role represent a single auth type, users would need to create a different role to authenticate with iam auth type anyways. The role tags created with previous role would only be applicable to that role. Clients won't be able to assume a broader set of policies as the role would remain the same.

I do agree with you that having role tags would reduce role sprawl. However, the right way IMO to do that is to have new role tags that are specific to iam auth types. There is a version in role tags. You are up for it, you could write version 2 of role tags that contains auth type and fields specific to auth types.

My concern here is about people changing allowed auth types on the role. Do we just make that immutable on the role?

Yes! We should make it immutable. That way, there is no interference between the two auth types.

We could just set an auth parameter of auth_type and ask customers ;) On an actual note, I think we should just error out in that scenario.

If the auth type on the role is immutable, we won't run into this problem at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jefferai -- since you seemed to express some reservations about this path in your previous comment, I just want to make sure you're also in agreement with it.

I don't think I'll have time to implement role tags v2 (i.e., supporting the iam auth type) in the immediate future as I will be doing quite a bit of travelling soon, and it's also getting deeper into a portion of the code that I'm not familiar with, and the current PR is already more than big enough. I'd suggest merging without role tags v2 first and then either you or I can come back and add them if there's customer demand for them. Does that sound sensible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've made the auth type immutable and taken out the ability to enable role tags with iam auth.

@@ -457,6 +461,9 @@ func (c *Core) teardownCredentials() error {
// newCredentialBackend is used to create and configure a new credential backend by name
func (c *Core) newCredentialBackend(
t string, sysView logical.SystemView, view logical.Storage, conf map[string]string) (logical.Backend, error) {
if alias, ok := credentialAliases[t]; ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing that this upgrade code is not good enough in terms of handling the mount table entries. The Type of mount entries will be aws-ec2 for the current entries and the upgrade code should change it to aws.

@jefferai I believe that we have run into this problem before (a long time back) and had a hard time fixing this. I think it was because we had changed the auth-enable command to take in aws-ec2 as opposed to just aws. Now, we need to switch back to it again :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vishal and I talked about this yesterday after he commented, I think that the right thing to do is not to change the type, so that anyone checking mount tables for expected values don't suddenly have the rug pulled out from under them.

My thinking is we should allow "aws-ec2" to be an auth type that can be enabled, and for any such mount types, keep the type value as such. However, all docs and such going forward should use "aws" and any mount created as that type should read "aws" for the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jefferai -- just to make sure I understand you, you're suggesting leaving the code as is, and not updating the mount type in the backend when aws-ec2 is encountered?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelthompson That's a perfectly fine way to go and that is what I meant. Sorry that I wasn't clear enough. Role tags V2 is something that we can tackle later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishalnayak was that comment meant for a different thread?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelthompson Yes, that's what I'm suggesting. There can be scripts that are expecting a certain value when checking whether mounts are correctly setup (in fact we have one such script internally). I don't really see a reason to break this and our general M.O. when possible is "keep old behavior, but only document the new". So new users will use aws, old users have a choice as to whether to keep using aws-ec2 or switch over.

This code is only really necessary if we remove the cli capability to add the backend as aws-ec2 which I think we should retain (but not document). What this code doesn't do is change the entry's type which is probably the right way to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jefferai Yes, the comment was for a different thread. It was to decide the default value for auth_type field on the role/roleName endpoint. Along with mounting, if the scripts are also creating roles (which is likely), now the created roles will default to iamAuthType, which both joel and I recall that it was discussed about but there wasn't a conclusion. Joel has actually changed the default to iamAuthType and I was wondering if this is okay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelthompson It would be great if you can test a scenario. Manual testing should be good.
Testcase:

  1. Mount aws-ec2 backend with the binary compiled on master and login.
  2. Mount aws backend with the binary compiled with the changes in this PR.
  3. Check if mounting, role creation and login from an older mount are working fine.

@joelthompson
Copy link
Contributor Author

Thanks @vishalnayak and no worries, just wanted to make sure I hadn't missed something :)

presigned request. Currently, POST is the only supported value`,
},

"iam_request_url": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this slip by mistake or did you have any other thoughts for not base64-ing this?

}

// Check if an STS configuration exists for the AWS account
sts, err := b.lockedAwsStsEntry(s, identityDoc.AccountID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of code to deduce stsRole seems to be at an off place. Can we move this below, just before the client creation? That way we will save a storage roundtrip when BoundIamRoleARN is not set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return logical.ErrorResponse(fmt.Sprintf("IAM instance profile ARN %q does not satisfy the constraint on role %q", iamInstanceProfileARN, roleName)), nil
}
if validationError != nil {
return logical.ErrorResponse(fmt.Sprintf("Error validating instance: %s", validationError)), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use %v for the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Use instance profile ARN to fetch the associated role ARN
iamRoleARN, err := b.instanceIamRoleARN(req.Storage, iamInstanceProfileName, identityDocParsed.Region, identityDocParsed.AccountID)
var roleTagResp *roleTagLoginResponse
if roleEntry.RoleTag != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for moving role tag processing up here? If not, can we move the handleRoleTagLogin processing after the stored identity checks as it was before? This avoids some computation in certain cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The move was left over from when I had role tag processing for both the ec2 and iam auth methods. Moved back.


method := data.Get("iam_http_request_method").(string)
if method == "" {
return logical.ErrorResponse("missing method"), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/method/http request method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to iam_http_request_method to match the parameter value.

determine the EC2 instance ID or unable to find the EC2 instance ID
among running instances, then authentication will fail.`,
},
"inferred_aws_region": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When inferred_entity_type is set, does having a default inferred_aws_region makes sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in fact, it's required. Otherwise, Vault doesn't know what region to query to find the EC2 instance we're inferring. With the ec2 auth type, the region is embedded in the instance identity doc. However, IAM is an inherently global service and so there's no way of telling a priori what region the instance is in based solely on the GetCallerIdentity response. In theory we could query all regions, but that could be quite slow.

This could potentially take in a list of allowable regions so that customers don't have to replicate roles for each inferred_aws_region, but I don't know how important of a feature that would be to your other customers (and it would still mean querying all of the allowable regions until we found an instance that matched).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Querying all of the allowable regions, or querying a list of supplied regions sound like an overkill. My thought was to default it to a specific region, but I guess it doesn't make sense to do it. Let's leave it as it is, requiring an explicit inferred_aws_region to be set when inferred_entity_type is set to ec2_instance.

field should be the 'key' of the tag on the EC2 instance. The 'value'
of the tag should be generated using 'role/<role>/tag' endpoint.
Defaults to an empty string, meaning that role tags are disabled. This
is only checked if auth_type is ec2 or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc needs an update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -241,6 +282,16 @@ func (b *backend) nonLockedAWSRole(s logical.Storage, roleName string) (*awsRole
}
}

// Check if there was no pre-existing AuthType set (from older versions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a wrong place to do this upgrade. nonLockedAWSRole will be invoked with a read lock and not a write lock. Also, the block above (which are not your changes) seems like a bug which is doing exactly the same thing.

If you don't mind, can you please move both of these blocks to lockedAWSRole method? In lockedAWSRole, after nonLockedAWSRole returns, we should invoke lockedSetAWSRole to perform a race free upgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, but I don't think your proposed fix will work.

lockedAWSRole only takes out a read lock on roleMutex. lockedSetAWSRole tries to take out a read/write lock on roleMutex, so if I call it from within lockedAWSRole then that'll deadlock. What I would really like is some sort of lock upgrade functionality to upgrade the read lock in lokcedAWSRole to a read/write lock and then call nonLockedSetAWSRole from there. But it doesn't seem like go offers such a primitive. Ugh.

We could try releasing the read lock and then immediately trying to take out a read/write lock, but there's still a window of opportunity for race conditions. Alternatively, is there some sort of "upgrade hook" to hook into whenever a newer version of Vault is run where we could put the upgrade code? If not, maybe the best solution is to just not write out the upgraded role to the backend at all? For now, I've just commented it out as that seems simplest and safest (at the cost of some slight future perf hits), but let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lock upgrade primitive would be great, yes, but I can imagine how it can get very hard to write one. Also, I was very much aware of the race window between the read and the writes. We release the read lock explicitly instead of the defer statement, upgrade the role entry, and invoke lockedSetAWSRole which acquires the write lock before updating the role. This sure is better than modifying the role with a read lock. A stale read is better than an inconsistent write :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, code path not being safe is not an excuse for not saving the upgrade :-) We must persist the upgraded role.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it's a bit more complicated than that. The reason is that between releasing the read lock and acquiring the write lock, another thread might have acquired the write lock and updated the role entry out from under us. So we could be overwriting with stale data. Anyway, I just pushed a fix that saves the role entry and should be safe.

return logical.ErrorResponse(fmt.Sprintf("unrecognized auth_type: %v", authTypeRaw.(string))), nil
}
} else if authTypeRaw.(string) != roleEntry.AuthType {
return logical.ErrorResponse("attempted to change auth_type on role"), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Optional] I'd phrase this as "changing auth_type on the role is not allowed".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks. done.

@@ -20,6 +20,12 @@ func pathRole(b *backend) *framework.Path {
Type: framework.TypeString,
Description: "Name of the role.",
},
"auth_type": {
Type: framework.TypeString,
Default: ec2AuthType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on defaulting to iamAuthType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the thing we kept going back and forth with @jefferai on and I don't think we ever landed on a final answer. The downside of changing this that it's a backwards-incompatible change in the API. Anyway, I've updated the default since that's what it seemed like you were leaning towards and also updated all the tests to explicitly specify the auth_type so switching will be as simple as changing this default back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ambivalent on this too, honestly. I'd rely on @jefferai's opinion on this.

"auth_type": iamAuthType,
"inferred_entity_type": inferredEntityType,
"inferred_entity_id": inferredEntityId,
"account_id": accountID,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add inferred_aws_region to the metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Each signed AWS request includes the current timestamp to mitigate the risk of
replay attacks. In addition, Vault allows you to require an additional header,
`X-Vault-AWSIAM-Server-ID`, to be present to mitigate against different types of replay
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/X-Vault-AWSIAM-Server-ID/X-Vault-AWS-IAM-Server-ID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

bound_iam_principal_arn=arn:aws:iam::123456789012:role/MyRole policies=prod,dev max_ttl=500h
```

#### Configure a required X-Vault-AWSIAM-Server-ID Header (recommended)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/AWSIAM/AWS-IAM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

<li>
<span class="param">iam_server_id_header_value</span>
<span class="param-flags">optional</span>
The value to require in the `X-Vault-AWSIAM-Server-ID` header as part of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/AWSIAM/AWS-IAM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

</ul>
<ul>
<li>
<span class="param">iam_server_id_header_value</span>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to keep the field documentation in the code in sync with whats in the docs here on the website. Post multiple rounds of review, some of them might have differed. Could you please give a run through the parameter descriptions of all the fields on endpoints and check if the code has the same? If not (like this one here), could you please update them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, made another pass through the docs.

@vishalnayak
Copy link
Member

vishalnayak commented Apr 18, 2017

@joelthompson This is looking great now. The only pending items in my view are:

  1. Some minor comments which are left on the PR now.
  2. Default auth method - Need @jefferai's thoughts here.
    This is about the default value for auth_type field on role/rolename endpoint. Should it be iamAuthType or ec2AuthType?
  3. The upgrade code locking fix.
    This is about moving role entry upgrade code from nonLockedAWSRole to lockedAWSRole and switching the locks before modifying the role entry using lockedSetAWSRole.

Copy link
Member

@jefferai jefferai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking fantastic. This is soooo much cleaner from the last time I took a look!

These are really very minor changes!

Key/Value Pairs:

mount=aws The mountpoint for the AWS credential provider.
Defaults to "aws-iam"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually defaults to "aws", not "aws-iam"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} else if req.Operation == logical.CreateOperation {
configEntry.IAMServerIdHeaderValue = data.Get("iam_server_id_header_value").(string)
}

// Since this endpoint supports both create operation and update operation,
// the error checks for access_key and secret_key not being set are not present.
// This allows calling this endpoint multiple times to provide the values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below, don't we only need to write the entry if changedCreds || req.Operation == logical.CreateOperation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. Done.

Type: framework.TypeString,
Description: `Base64-encoded full URL against which to make the AWS request
when using iam auth_type. If using a POST request with the action specified in the
body, this should just be "Lw==" ("/" base64-encoded).`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than require this special value if using a POST request, can we just check for the value being empty or being this and do the right thing in either case? Or maybe just add a Default field with that value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to default to this because this is used to derive the canonical URI which is included in the canonical request, over which the signature is calculated, and it feels wrong to essentially guess a default value that is included in a signature. I'd imagine pretty much every single client here is going to be some sort of programmatic client (because calculating the AWS Sigv4 is hard enough in code, much less by hand) and so I'd rather just be explicit.

And, FWIW, the base64-encodings of / and https://sts.amazonaws.com/ actually lead to the same canonical URI and so they are interchangeable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense!

Description: `If set, defines a constraint on the EC2 instances that the region in its identity document to match the one specified by this parameter.`,
Type: framework.TypeString,
Description: `If set, defines a constraint on the EC2 instances that the region in
its identity document to match the one specified by this parameter. Only applicable when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/to match/must match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -71,7 +71,7 @@ func Commands(metaPtr *meta.Meta) map[string]cli.CommandFactory {
CredentialBackends: map[string]logical.Factory{
"approle": credAppRole.Factory,
"cert": credCert.Factory,
"aws-ec2": credAwsEc2.Factory,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to not break scripts that are setting up backends automatically we should actually support both aws-ec2 and aws but only document the latter. @vishalnayak thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current alias code in vault/auth.go will still allow scripts that do things like vault auth-enable aws-ec2 if that's what you're concerned about (and mount them at /auth/aws-ec2, maintaining backwards compatibility), which I think is what you're suggesting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I guess that's a bit more complicated but also a bit more explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBC, I originally had both aws-ec2 and aws here without the alias code, but I changed it to using the alias at your suggestion... :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember saying that (maybe it was Vishal?), but, it just means it's not something I have strong feelings about :-)

@@ -457,6 +461,9 @@ func (c *Core) teardownCredentials() error {
// newCredentialBackend is used to create and configure a new credential backend by name
func (c *Core) newCredentialBackend(
t string, sysView logical.SystemView, view logical.Storage, conf map[string]string) (logical.Backend, error) {
if alias, ok := credentialAliases[t]; ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishalnayak was that comment meant for a different thread?

@@ -457,6 +461,9 @@ func (c *Core) teardownCredentials() error {
// newCredentialBackend is used to create and configure a new credential backend by name
func (c *Core) newCredentialBackend(
t string, sysView logical.SystemView, view logical.Storage, conf map[string]string) (logical.Backend, error) {
if alias, ok := credentialAliases[t]; ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelthompson Yes, that's what I'm suggesting. There can be scripts that are expecting a certain value when checking whether mounts are correctly setup (in fact we have one such script internally). I don't really see a reason to break this and our general M.O. when possible is "keep old behavior, but only document the new". So new users will use aws, old users have a choice as to whether to keep using aws-ec2 or switch over.

This code is only really necessary if we remove the cli capability to add the backend as aws-ec2 which I think we should retain (but not document). What this code doesn't do is change the entry's type which is probably the right way to go.

<li<%= sidebar_current("docs-auth-aws-ec2") %>>
<a href="/docs/auth/aws-ec2.html">AWS EC2</a>
<li<%= sidebar_current("docs-auth-aws") %>>
<a href="/docs/auth/aws-ec2.html">AWS</a>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to move/update this page? I can ask around here if there is an SEO reason but usually we'd handle that at a higher level with redirects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I stated in my very first comment, I just didn't know how you wanted to handle it. Sounds like you're saying I should just rename it and you'll add the higher level redirect? Done, but in a separate commit so it should be easy to roll back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking internally but I believe this is the right way, yeah. Thanks for renaming it.

@jefferai
Copy link
Member

@vishalnayak Can you explain items 2 and 3 in a bit more detail? Not sure what you need from me there, unless I answered number 2 in my review.

Copy link
Contributor Author

@joelthompson joelthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I've addressed everything, the only outstanding question is whether we should make the default auth_method iam or ec2. I've set it to iam for now, but it should be trivial to switch back. @jeffrai?

The tests are passing for the aws backend:

$ git rev-parse HEAD
c98898c863380befd8f90c589432c816368b23be
$ make testacc TEST=./builtin/credential/aws/
go generate 
VAULT_ACC=1 go test -tags='vault' ./builtin/credential/aws/ -v  -timeout 45m
=== RUN   TestBackend_CreateParseVerifyRoleTag
--- PASS: TestBackend_CreateParseVerifyRoleTag (0.00s)
=== RUN   TestBackend_prepareRoleTagPlaintextValue
--- PASS: TestBackend_prepareRoleTagPlaintextValue (0.00s)
=== RUN   TestBackend_CreateRoleTagNonce
--- PASS: TestBackend_CreateRoleTagNonce (0.00s)
=== RUN   TestBackend_ConfigTidyIdentities
--- PASS: TestBackend_ConfigTidyIdentities (0.00s)
=== RUN   TestBackend_ConfigTidyRoleTags
--- PASS: TestBackend_ConfigTidyRoleTags (0.00s)
=== RUN   TestBackend_TidyIdentities
--- PASS: TestBackend_TidyIdentities (0.00s)
=== RUN   TestBackend_TidyRoleTags
--- PASS: TestBackend_TidyRoleTags (0.00s)
=== RUN   TestBackend_ConfigClient
2017/04/19 04:09:47.166949 [TRACE] physical/cache: creating LRU cache: size=32768
2017/04/19 04:09:47.167050 [INFO ] core: security barrier not initialized
2017/04/19 04:09:47.167139 [INFO ] core: security barrier initialized: shares=1 threshold=1
2017/04/19 04:09:47.167220 [TRACE] core: cluster name not found/set, generating new
2017/04/19 04:09:47.167236 [DEBUG] core: cluster name set: name=vault-cluster-35378026
2017/04/19 04:09:47.167255 [TRACE] core: cluster ID not found, generating new
2017/04/19 04:09:47.167267 [DEBUG] core: cluster ID set: id=aa96cb89-1033-7cfa-a59c-4e92e4d7766c
2017/04/19 04:09:47.167310 [INFO ] core: post-unseal setup starting
2017/04/19 04:09:47.167321 [TRACE] core: clearing forwarding clients
2017/04/19 04:09:47.167328 [TRACE] core: done clearing forwarding clients
2017/04/19 04:09:47.179949 [INFO ] core: loaded wrapping token key
2017/04/19 04:09:47.180634 [INFO ] core: successfully mounted backend: type=generic path=secret/
2017/04/19 04:09:47.180670 [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2017/04/19 04:09:47.180787 [INFO ] core: successfully mounted backend: type=system path=sys/
2017/04/19 04:09:47.181008 [INFO ] rollback: starting rollback manager
2017/04/19 04:09:47.182832 [INFO ] expiration: restoring leases
2017/04/19 04:09:47.182849 [DEBUG] expiration: collecting leases
2017/04/19 04:09:47.182859 [DEBUG] expiration: leases collected: num_existing=0
2017/04/19 04:09:47.183557 [INFO ] core: post-unseal setup complete
2017/04/19 04:09:47.183738 [INFO ] core: root token generated
2017/04/19 04:09:47.183754 [INFO ] core: pre-seal teardown starting
2017/04/19 04:09:47.183762 [TRACE] core: clustering disabled, not stopping listeners
2017/04/19 04:09:47.183785 [INFO ] rollback: stopping rollback manager
2017/04/19 04:09:47.183865 [INFO ] core: pre-seal teardown complete
2017/04/19 04:09:47.183962 [INFO ] core: vault is unsealed
2017/04/19 04:09:47.183993 [INFO ] core: post-unseal setup starting
2017/04/19 04:09:47.184004 [TRACE] core: clearing forwarding clients
2017/04/19 04:09:47.184012 [TRACE] core: done clearing forwarding clients
2017/04/19 04:09:47.184078 [INFO ] core: loaded wrapping token key
2017/04/19 04:09:47.184245 [INFO ] core: successfully mounted backend: type=generic path=secret/
2017/04/19 04:09:47.184350 [INFO ] core: successfully mounted backend: type=system path=sys/
2017/04/19 04:09:47.184377 [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2017/04/19 04:09:47.184592 [INFO ] rollback: starting rollback manager
2017/04/19 04:09:47.184946 [INFO ] expiration: restoring leases
2017/04/19 04:09:47.184963 [DEBUG] expiration: collecting leases
2017/04/19 04:09:47.184973 [DEBUG] expiration: leases collected: num_existing=0
2017/04/19 04:09:47.185616 [INFO ] core: post-unseal setup complete
2017/04/19 04:09:47.188038 [INFO ] core: successful mount: path=mnt/ type=test
04:09:47.188229 WRN ~ Executing test step step_number: 1 in: 
04:09:47.188968 WRN ~ Executing test step step_number: 2 in: 
04:09:47.189209 WRN ~ Executing test step step_number: 3 in: 
04:09:47.189813 WRN ~ Executing test step step_number: 4 in: 
04:09:47.190298 WRN ~ Requesting RollbackOperation in: 
--- PASS: TestBackend_ConfigClient (0.02s)
=== RUN   TestBackend_pathConfigCertificate
--- PASS: TestBackend_pathConfigCertificate (0.00s)
=== RUN   TestBackend_parseAndVerifyRoleTagValue
--- PASS: TestBackend_parseAndVerifyRoleTagValue (0.00s)
=== RUN   TestBackend_PathRoleTag
--- PASS: TestBackend_PathRoleTag (0.00s)
=== RUN   TestBackend_PathBlacklistRoleTag
--- PASS: TestBackend_PathBlacklistRoleTag (0.00s)
=== RUN   TestBackendAcc_LoginWithInstanceIdentityDocAndWhitelistIdentity
--- PASS: TestBackendAcc_LoginWithInstanceIdentityDocAndWhitelistIdentity (1.19s)
=== RUN   TestBackend_pathStsConfig
--- PASS: TestBackend_pathStsConfig (0.00s)
=== RUN   TestBackendAcc_LoginWithCallerIdentity
--- PASS: TestBackendAcc_LoginWithCallerIdentity (0.14s)
=== RUN   TestBackend_pathConfigClient
--- PASS: TestBackend_pathConfigClient (0.00s)
=== RUN   TestBackend_pathLogin_getCallerIdentityResponse
--- PASS: TestBackend_pathLogin_getCallerIdentityResponse (0.00s)
=== RUN   TestBackend_pathLogin_parseIamArn
--- PASS: TestBackend_pathLogin_parseIamArn (0.00s)
=== RUN   TestBackend_validateVaultHeaderValue
--- PASS: TestBackend_validateVaultHeaderValue (0.00s)
=== RUN   TestBackend_pathRoleEc2
--- PASS: TestBackend_pathRoleEc2 (0.00s)
=== RUN   TestBackend_pathIam
--- PASS: TestBackend_pathIam (0.00s)
=== RUN   TestBackend_pathRoleMixedTypes
--- PASS: TestBackend_pathRoleMixedTypes (0.00s)
=== RUN   TestAwsEc2_RoleCrud
--- PASS: TestAwsEc2_RoleCrud (0.00s)
=== RUN   TestAwsEc2_RoleDurationSeconds
--- PASS: TestAwsEc2_RoleDurationSeconds (0.00s)
PASS
ok      github.com/hashicorp/vault/builtin/credential/aws       1.392s

"auth_type": iamAuthType,
"inferred_entity_type": inferredEntityType,
"inferred_entity_id": inferredEntityId,
"account_id": accountID,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Each signed AWS request includes the current timestamp to mitigate the risk of
replay attacks. In addition, Vault allows you to require an additional header,
`X-Vault-AWSIAM-Server-ID`, to be present to mitigate against different types of replay
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

bound_iam_principal_arn=arn:aws:iam::123456789012:role/MyRole policies=prod,dev max_ttl=500h
```

#### Configure a required X-Vault-AWSIAM-Server-ID Header (recommended)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

<li>
<span class="param">iam_server_id_header_value</span>
<span class="param-flags">optional</span>
The value to require in the `X-Vault-AWSIAM-Server-ID` header as part of
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Description: `If set, defines a constraint on the EC2 instances that the region in its identity document to match the one specified by this parameter.`,
Type: framework.TypeString,
Description: `If set, defines a constraint on the EC2 instances that the region in
its identity document to match the one specified by this parameter. Only applicable when
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Key/Value Pairs:

mount=aws The mountpoint for the AWS credential provider.
Defaults to "aws-iam"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} else if req.Operation == logical.CreateOperation {
configEntry.IAMServerIdHeaderValue = data.Get("iam_server_id_header_value").(string)
}

// Since this endpoint supports both create operation and update operation,
// the error checks for access_key and secret_key not being set are not present.
// This allows calling this endpoint multiple times to provide the values.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. Done.

Type: framework.TypeString,
Description: `Base64-encoded full URL against which to make the AWS request
when using iam auth_type. If using a POST request with the action specified in the
body, this should just be "Lw==" ("/" base64-encoded).`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to default to this because this is used to derive the canonical URI which is included in the canonical request, over which the signature is calculated, and it feels wrong to essentially guess a default value that is included in a signature. I'd imagine pretty much every single client here is going to be some sort of programmatic client (because calculating the AWS Sigv4 is hard enough in code, much less by hand) and so I'd rather just be explicit.

And, FWIW, the base64-encodings of / and https://sts.amazonaws.com/ actually lead to the same canonical URI and so they are interchangeable.

<li<%= sidebar_current("docs-auth-aws-ec2") %>>
<a href="/docs/auth/aws-ec2.html">AWS EC2</a>
<li<%= sidebar_current("docs-auth-aws") %>>
<a href="/docs/auth/aws-ec2.html">AWS</a>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I stated in my very first comment, I just didn't know how you wanted to handle it. Sounds like you're saying I should just rename it and you'll add the higher level redirect? Done, but in a separate commit so it should be easy to roll back.

@@ -71,7 +71,7 @@ func Commands(metaPtr *meta.Meta) map[string]cli.CommandFactory {
CredentialBackends: map[string]logical.Factory{
"approle": credAppRole.Factory,
"cert": credCert.Factory,
"aws-ec2": credAwsEc2.Factory,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current alias code in vault/auth.go will still allow scripts that do things like vault auth-enable aws-ec2 if that's what you're concerned about (and mount them at /auth/aws-ec2, maintaining backwards compatibility), which I think is what you're suggesting.

@@ -457,6 +461,9 @@ func (c *Core) teardownCredentials() error {
// newCredentialBackend is used to create and configure a new credential backend by name
func (c *Core) newCredentialBackend(
t string, sysView logical.SystemView, view logical.Storage, conf map[string]string) (logical.Backend, error) {
if alias, ok := credentialAliases[t]; ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelthompson It would be great if you can test a scenario. Manual testing should be good.
Testcase:

  1. Mount aws-ec2 backend with the binary compiled on master and login.
  2. Mount aws backend with the binary compiled with the changes in this PR.
  3. Check if mounting, role creation and login from an older mount are working fine.

b.roleMutex.RUnlock()
// now we need to ensure that we re-acquire the read lock so the above deferred RUnlock() doesn't
// cause a crash trying to unlock the already-unlocked mutex
defer b.roleMutex.RLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, we can delete this defer statement and the one at L204. That way, we explicitly release the read lock before acquiring a RW lock here and the defer for RW lock should be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I delete the defer on L204 and needUpgrade is false, then how will the read lock get released?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry, missed that case. I had thought about it and forgot to mention while typing up the comment. We need to explicitly release the locks and return. If needUpgrade is true we release the write lock and return, if not we release the read lock and return. Probably defer statements just before the return will also do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done.

@jefferai
Copy link
Member

I had a thought about defaulting to IAM or EC2. It would be nice to default to IAM but be backwards compatible. Right now a logical.Request has the MountPoint attached. I think it will be useful/necessary at some point to also fill in a MountType anyways. If we did that now, the default if it was mounted with a type of aws-ec2 could be EC2, and in all other cases default to IAM. This doesn't break existing setups but new users mounting it as the aws type will get the new default behavior.

Thoughts?

@vishalnayak
Copy link
Member

I think passing in MountType via logical.Request is a good idea.

When MountType is aws-ec2, default to ec2 auth_type for backwards
compatibility with legacy roles. Otherwise, default to iam.
Copy link
Contributor Author

@joelthompson joelthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I went ahead and took @jefferai's suggestion to include MountType in logical.Request and based the default on MountType. I also ran several manual tests:

  1. make dev from from rev 33d2f6f followed by a vault server
  2. vault init followed by vault unseal
  3. vault auth-enable aws-ec2
  4. vault write auth/aws-ec2/role/test-old-type-1 bound_ami_id=$(curl http://169.254.169.254/latest/meta-data/ami-id)
  5. vault write auth/aws-ec2/login role=test-old-type-1 pkcs7=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/pkcs7 | tr -d '\n') nonce=96670fe2-14ea-74e6-5cae-41ae930b2fd8 allows me to login
  6. Stopped the Vault server
  7. make dev from rev 0905746 followed by vault server and vault unseal
  8. vault auth -methods returns:
$ vault auth -methods
Path      Type     Default TTL  Max TTL  Replication Behavior  Description
aws-ec2/  aws-ec2  system       system   replicated            
token/    token    system       system   replicated            token based credentials

Now testing basic backwards compatibility:

  1. vault auth-enable -path=aws-ec2-alt aws-ec2 succeeded
  2. vault auth-enable aws succeeded
  3. vault auth-methods returns:
Path          Type     Default TTL  Max TTL  Replication Behavior  Description
aws-ec2-alt/  aws-ec2  system       system   replicated            
aws-ec2/      aws-ec2  system       system   replicated            
aws/          aws      system       system   replicated            
token/        token    system       system   replicated            token based credentials
  1. vault read auth/aws-ec2/role/test-old-type-1 returns:
Key                             Value
---                             -----
allow_instance_migration        false
auth_type                       ec2
bound_account_id                
bound_ami_id                    ami-22ce4934
bound_iam_instance_profile_arn  
bound_iam_principal_arn         
bound_iam_role_arn              
bound_region                    
bound_subnet_id                 
bound_vpc_id                    
disallow_reauthentication       false
inferred_aws_region             
inferred_entity_type            
max_ttl                         0
period                          0
policies                        [default]
role_tag                        
ttl                             0

So it looks like the basic backwards compatibility works, including the upgrading of the auth_type.

Creating some new roles:

  1. vault write auth/aws-ec2/role/test-old-type-2 bound_ami_id=$(curl http://169.254.169.254/latest/meta-data/ami-id) succeeded (the mount that I created from the version of vault on master)
  2. vault write auth/aws-ec2-alt/role/test-old-type-alt bound_ami_id=$(curl http://169.254.169.254/latest/meta-data/ami-id) succeeded (the mount I created using my code)
  3. vault write auth/aws/role/test-new bound_iam_principal_arn=arn:aws:iam::XXXXXXXXXXXX:role/VaultTestRole succeeded

Reading them out:

vault read auth/aws-ec2/role/test-old-type-2 returns:

Key                             Value
---                             -----
allow_instance_migration        false
auth_type                       ec2
bound_account_id                
bound_ami_id                    ami-22ce4934
bound_iam_instance_profile_arn  
bound_iam_principal_arn         
bound_iam_role_arn              
bound_region                    
bound_subnet_id                 
bound_vpc_id                    
disallow_reauthentication       false
inferred_aws_region             
inferred_entity_type            
max_ttl                         0
period                          0
policies                        [default]
role_tag                        
ttl                             0

so it had a default auth_type of ec2 as desired.

vault read auth/aws-ec2-alt/role/test-old-type-alt returns:

Key                             Value
---                             -----
allow_instance_migration        false
auth_type                       ec2
bound_account_id                
bound_ami_id                    ami-22ce4934
bound_iam_instance_profile_arn  
bound_iam_principal_arn         
bound_iam_role_arn              
bound_region                    
bound_subnet_id                 
bound_vpc_id                    
disallow_reauthentication       false
inferred_aws_region             
inferred_entity_type            
max_ttl                         0
period                          0
policies                        [default]
role_tag                        
ttl                             0

so it also got a default auth_type of ec2

vault read auth/aws/role/test-new returns:

Key                             Value
---                             -----
allow_instance_migration        false
auth_type                       iam
bound_account_id                
bound_ami_id                    
bound_iam_instance_profile_arn  
bound_iam_principal_arn         arn:aws:iam::XXXXXXXXXXXX:role/VaultTestRole
bound_iam_role_arn              
bound_region                    
bound_subnet_id                 
bound_vpc_id                    
disallow_reauthentication       false
inferred_aws_region             
inferred_entity_type            
max_ttl                         0
period                          0
policies                        [default]
role_tag                        
ttl                             0

so it got the default auth_type of iam, as expected.

Authenticating to them:

vault write auth/aws-ec2/login role=test-old-type-1 pkcs7=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/pkcs7 | tr -d '\n') nonce=96670fe2-14ea-74e6-5cae-41ae930b2fd8 returns:

Key                             Value
---                             -----
token                           478f5a00-5b90-7a9e-1944-e784c79b197a
token_accessor                  dea7065b-c293-7433-1c12-df09bdd5bc2d
token_duration                  768h0m0s
token_renewable                 true
token_policies                  [default]
token_meta_region               "us-east-1"
token_meta_role                 "test-old-type-1"
token_meta_role_tag_max_ttl     "0s"
token_meta_account_id           "XXXXXXXXXXXX"
token_meta_ami_id               "ami-22ce4934"
token_meta_instance_id          "i-XXXXXXXXXXXXXXXXX"
token_meta_nonce                "96670fe2-14ea-74e6-5cae-41ae930b2fd8"

vault write auth/aws-ec2/login role=test-old-type-2 pkcs7=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/pkcs7 | tr -d '\n') nonce=96670fe2-14ea-74e6-5cae-41ae930b2fd8 returns:

Key                             Value
---                             -----
token                           6a8eaf7b-c004-8cce-abd2-8b050d9ad623
token_accessor                  14efe5e3-306f-05f0-8751-80568c6f77fa
token_duration                  768h0m0s
token_renewable                 true
token_policies                  [default]
token_meta_account_id           "XXXXXXXXXXXX"
token_meta_ami_id               "ami-22ce4934"
token_meta_instance_id          "i-XXXXXXXXXXXXXXXXX"
token_meta_nonce                "96670fe2-14ea-74e6-5cae-41ae930b2fd8"
token_meta_region               "us-east-1"
token_meta_role                 "test-old-type-2"
token_meta_role_tag_max_ttl     "0s"

vault write auth/aws-ec2-alt/login role=test-old-type-alt pkcs7=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/pkcs7 | tr -d '\n') returns:

Key                             Value
---                             -----
token                           2748caed-bbd4-a2de-a158-20165cf1186a
token_accessor                  eec0415b-d163-e663-fef4-e9ec86abbd10
token_duration                  768h0m0s
token_renewable                 true
token_policies                  [default]
token_meta_instance_id          "i-XXXXXXXXXXXXXXXXX"
token_meta_nonce                "23e7a1e5-c25b-5602-2eaa-e93bb97d7782"
token_meta_region               "us-east-1"
token_meta_role                 "test-old-type-alt"
token_meta_role_tag_max_ttl     "0s"
token_meta_account_id           "XXXXXXXXXXXX"
token_meta_ami_id               "ami-22ce4934"

vault auth -method=aws role=test-new returns:

Successfully authenticated! You are now logged in.
The token below is already saved in the session. You do not
need to "vault auth" again with the token.
token: 553d4383-6b14-f963-8e7f-904fcfc157a2
token_duration: 2764799
token_policies: [default]

So authentication in the following scenarios worked:

  1. aws-ec2 backend mounted using master, role created using master, server running this code
  2. aws-ec2 backend mounted using master, role created with this code, server running this code
  3. aws-ec2 backend mounted using this code, role created with this code, server running this code
  4. aws backend

b.roleMutex.RUnlock()
// now we need to ensure that we re-acquire the read lock so the above deferred RUnlock() doesn't
// cause a crash trying to unlock the already-unlocked mutex
defer b.roleMutex.RLock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done.

Copy link
Member

@vishalnayak vishalnayak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job with this @joelthompson. Thank you for your patience, follow ups and testing. Other than Jeff's last comment, this PR LGTM!

@mfischer-zd
Copy link
Contributor

mfischer-zd commented Apr 20, 2017

Congratulations, everyone, for your persistent efforts. As a spectator, I found this process to be instructive as a model in how to lead a difficult effort with many challenges to a constructive outcome. As a user, I'm very excited to begin using this in the next release.

🍻 🥇 👏

@jefferai
Copy link
Member

That one minor change is all that's waiting on my end!

Previously the request router reset the MountPoint and MountType back to
the empty string before returning to the core. This ensures they get set
back to the correct values.
@joelthompson
Copy link
Contributor Author

Done!

@vishalnayak vishalnayak merged commit 5a934e6 into hashicorp:master Apr 24, 2017
@jefferai
Copy link
Member

Fantastic work @joelthompson !

@joelthompson
Copy link
Contributor Author

Thanks @jefferai ! :)

@joelthompson joelthompson deleted the feature/unified_aws_auth branch April 25, 2017 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants