Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for AWS temporary credentials #2573

Merged
merged 2 commits into from
Jan 28, 2022

Conversation

JacobHenner
Copy link
Contributor

@JacobHenner JacobHenner commented Jan 27, 2022

Add support for AWS temporary credentials by allowing session tokens to
be specified.

Assuming Secrets are kept up-to-date with valid session tokens, scalers
using temporary credentials will error once after token expiration. The
scaler cache for the corresponding ScaledObject will be cleared, the
scaler will be rebuilt using the updated temporary credentials, and the
scaler will resume operation.

Signed-off-by: Jacob Henner code@ventricle.us

Checklist

Fixes #2495

Add support for AWS temporary credentials by allowing session tokens to
be specified.

Assuming Secrets are kept up-to-date with valid session tokens, scalers
using temporary credentials will error once after token expiration. The
scaler cache for the corresponding ScaledObject will be cleared, the
scaler will be rebuilt using the updated temporary credentials, and the
scaler will resume operation.

Signed-off-by: Jacob Henner <code@ventricle.us>
@JacobHenner JacobHenner requested a review from a team as a code owner January 27, 2022 19:11
@JacobHenner
Copy link
Contributor Author

JacobHenner commented Jan 27, 2022

Assuming Secrets are kept up-to-date with valid session tokens, scalers using temporary credentials will error once after token expiration. The scaler cache for the corresponding ScaledObject will be cleared, the scaler will be rebuilt using the updated temporary credentials, and the scaler will resume operation.

I am looking for feedback on whether waiting for an error is an acceptable way to trigger a credential refresh (i.e. reread the contents of the k8s Secret, which has presumably been updated with an unexpired session token). This approach worked in my testing, but I do realize there are at least a few potentially unwanted effects:

  • Scaler cache needs to be invalidated/rebuilt (probably not a big deal, but worth mentioning)
  • An HPA Scaler Error will be recorded in the metrics each time a temporary credential expires (potentially a bigger deal, depending on the expectations of users consuming this metric)

if err != nil {
scalerError = true
logger.Error(err, "error getting metric for scaler", "scaledObject.Namespace", scaledObject.Namespace, "scaledObject.Name", scaledObject.Name, "scaler", scaler)
} else {
for _, metric := range metrics {
metricValue, _ := metric.Value.AsInt64()
metricsServer.RecordHPAScalerMetric(namespace, scaledObject.Name, scalerName, scalerIndex, metric.MetricName, metricValue)
}
matchingMetrics = append(matchingMetrics, metrics...)
}
metricsServer.RecordHPAScalerError(namespace, scaledObject.Name, scalerName, scalerIndex, info.Metric, err)

@zroubalik
Copy link
Member

* An HPA Scaler Error will be recorded in the metrics each time a temporary credential expires (potentially a bigger deal, depending on the expectations of users consuming this metric)

Are you able to distinguish the error caused by expired credentials? We might want to add a check in here to not include that in the metric or maybe better would be to introduce a new type of metrics for credentials expiration?

@JacobHenner
Copy link
Contributor Author

Are you able to distinguish the error caused by expired credentials? We might want to add a check in here to not include that in the metric or maybe better would be to introduce a new type of metrics for credentials expiration?

The logs reflect a 403 "ExpiredToken", but I haven't explored if the library that interacts with AWS parses the specifics of the error (or if it's just returning error text from the API), or how the specifics could be propagated back to KEDA.

Assuming the error can be propagated back, how would we prefer to handle it?

We could:

  • Create a new type of metric for creds expiration - but I think that'd be similarly confusing (one event at expiration time is expected, > 1 event at expiration time is bad and indicates actual scaler failure).
  • Only report a failure for credential expiration errors if there are >= 2 consecutive failures
    • We might want to consider a similar approach for static credentials too, as they can also be rotated (generally with a much lower frequency).
  • Build an independent credential refresh mechanism - e.g. Watch created HPA and TriggerAuthentication resources #511, TriggerAuthentication Secret is not reloaded when Secret changes #563
  • (something else?)

@zroubalik
Copy link
Member

Are you able to distinguish the error caused by expired credentials? We might want to add a check in here to not include that in the metric or maybe better would be to introduce a new type of metrics for credentials expiration?

The logs reflect a 403 "ExpiredToken", but I haven't explored if the library that interacts with AWS parses the specifics of the error (or if it's just returning error text from the API), or how the specifics could be propagated back to KEDA.

Assuming the error can be propagated back, how would we prefer to handle it?

We could:

* Create a new type of metric for creds expiration - but I think that'd be similarly confusing (one event at expiration time is expected, > 1 event at expiration time is bad and indicates actual scaler failure).

* Only report a failure for credential expiration errors if there are >= 2 consecutive failures
  
  * We might want to consider a similar approach for static credentials too, as they can also be rotated (generally with a much lower frequency).

* Build an independent credential refresh mechanism - e.g. [Watch created HPA and TriggerAuthentication resources #511](https://github.com/kedacore/keda/issues/511), [TriggerAuthentication Secret is not reloaded when Secret changes #563](https://github.com/kedacore/keda/issues/563)

* (something else?)

Those are all good questions, I think we can go ahead with this change as it is now and open an issue for the follow up?
I will do a proper review tomorrow, but could you please update the Changelog as well?
Thanks!

@JacobHenner
Copy link
Contributor Author

Those are all good questions, I think we can go ahead with this change as it is now and open an issue for the follow up?
I will do a proper review tomorrow, but could you please update the Changelog as well?
Thanks!

Yep, that's fine by me. I'll make the changelog addition in a bit. Thanks!

Signed-off-by: Jacob Henner <code@ventricle.us>
Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zroubalik
Copy link
Member

@JacobHenner could you please open follow up issue with the concerns we raised here?

@JacobHenner
Copy link
Contributor Author

@JacobHenner could you please open follow up issue with the concerns we raised here?

#2578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for temporary AWS IAM credentials (using session tokens) from secrets
2 participants