-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug determining jwt validity due to incorrect computation of system timestamp and provide configuration option to allow for timely slack in token validity #10753
Conversation
// (for another 1 seconds), forwards the request to Istio Ingressgateway and subsequently | ||
// to some pod with an envoy sidecar. Meanwhile, 0.1 seconds have passed and when envoy checks | ||
// the token it finds that it has expired. | ||
const uint64_t unix_timestamp = absl::ToUnixSeconds(absl::Now()) - 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with change to use absl::ToUnixSeconds(absl::Now()).
But I don't like "- 5" part, it seems like a hack to solve your particular problem
BTW, even you passed this check, there is another check in verifyJwt which is calling this code here
https://github.com/google/jwt_verify_lib/blob/master/src/verify.cc#L163
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specific amount of slack is debatable. However, I don't think there is any doubt that there is a general need for some slack. Whatever part of an authn system (be it AWS ALB or https://istio.io/blog/2019/app-identity-and-access-adapter/) that refreshes access tokens has to check if the current token is still valid. Finally, istio-proxy validates the token again and the time between these two validation events is the time needed as slack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make the amount configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like this idea. Let us make it configurable from filter config. Please see my other comments.
@@ -100,6 +100,10 @@ class AuthenticatorImpl : public Logger::Loggable<Logger::Id::jwt>, | |||
const bool is_allow_failed_; | |||
const bool is_allow_missing_; | |||
TimeSource& time_source_; | |||
|
|||
// allow 5 seconds of slack when determining token expery | |||
const uint64_t jwt_exp_slack = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to config this slack time from filter config. If default is 0, it will not change existing behaviors.
We can add two fields:
exp_slack: // now <= exp + exp_slack
bnf_slack: // now >= bnf - bnf_slack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I use filter config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on it. Think I need a proper build environment. Is there some documentation to set it up?
Can't find it in https://github.com/envoyproxy/envoy/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Good. Will do the docker thingy.
|
||
// allow 5 seconds of slack when determining token expery | ||
const uint64_t jwt_exp_slack = 5; | ||
uint64_t now; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not need to add anything to class members.
@@ -239,7 +243,7 @@ void AuthenticatorImpl::onDestroy() { | |||
|
|||
// Verify with a specific public key. | |||
void AuthenticatorImpl::verifyKey() { | |||
const Status status = ::google::jwt_verify::verifyJwt(*jwt_, *jwks_data_->getJwksObj()); | |||
const Status status = ::google::jwt_verify::verifyJwt(*jwt_, *jwks_data_->getJwksObj(), now - jwt_exp_slack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to double check the exp an bnf here. My suggestion is:
- change code in https://github.com/google/jwt_verify_lib/blob/master/src/verify.cc
- add a new function called verifyJwtWithoutTimeChecking(), this function will not check nbf_ and exp_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err. Are you sure? jwt_verify_lib isn't an Istio thing, is it?
…dity against them
// The two below fields define the amount of slack in seconds that will be used | ||
// when determining if a JWT is valid or has expired. Validity is determined by | ||
// the formula: VALID_FROM("iat") - nbf_slack_slack <= NOW <= VALID_TO("exp") + exp_slack. | ||
int64 nbf_slack = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use int32. Please check out this protobuf guide
default value for int32 is 0 in protobuf, not need to specify it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in Envoy we override Google style and prefer uint32
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there are absolute high-performance needs (which we don't have here) you should always make values explicit. This is clear to anyone without further poking any documentation.
// Consider the following case: | ||
// AWS ALB receives a request and determines that the existing access token is valid | ||
// (for another 1 seconds), forwards the request to Istio Ingressgateway and subsequently | ||
// to some pod with an envoy sidecar. Meanwhile, 0.1 seconds have passed and when envoy checks | ||
// the token it finds that it has expired. | ||
now = absl::ToUnixSeconds(absl::Now()); | ||
const uint64_t now = absl::ToUnixSeconds(absl::Now()); | ||
const int64_t nbf_slack = jwks_data_->getJwtProvider().nbf_slack(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jwks_data_ is only assigned in line 186, you have to move this code after that.
// If the nbf claim does *not* appear in the JWT, then the nbf field is defaulted | ||
// to 0. | ||
if (jwt_->nbf_ > now) { | ||
if (jwt_->nbf_ - nbf_slack > now) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be careful about signed and unsigned comparison here. Normally, nbf_ is 0.
change it to
if (nbf_ > now + nbf_clack) will avoid the sign integer issue
doneWithStatus(Status::JwtNotYetValid); | ||
return; | ||
} | ||
// If the exp claim does *not* appear in the JWT then the exp field is defaulted | ||
// to 0. | ||
if (jwt_->exp_ > 0 && jwt_->exp_ < now - jwt_exp_slack) { | ||
if (jwt_->exp_ > 0 && jwt_->exp_ - exp_slack < now) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change it to .... && exp_ < now - exp_slack
is easier to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I get exp_slack and nbf_slack from the 'jwks_data_' that I have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. This change looks pretty gnarly from a security perspective, glad to have @qiwzhang doing first pass (appreciated). Can you also update to a more meaningful title?
// The two below fields define the amount of slack in seconds that will be used | ||
// when determining if a JWT is valid or has expired. Validity is determined by | ||
// the formula: VALID_FROM("iat") - nbf_slack_slack <= NOW <= VALID_TO("exp") + exp_slack. | ||
int64 nbf_slack = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in Envoy we override Google style and prefer uint32
:)
|
||
// The two below fields define the amount of slack in seconds that will be used | ||
// when determining if a JWT is valid or has expired. Validity is determined by | ||
// the formula: VALID_FROM("iat") - nbf_slack_slack <= NOW <= VALID_TO("exp") + exp_slack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this formula come from? RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formula is about token validity and it is not mentioned in https://tools.ietf.org/html/rfc6749. But let me make my case:
///////////////////////////////////////////////////
Some load balancer performinc OIDC authentication receives a request and determines that the existing access token is valid (for another 1 second). It forwards the request to Istio Ingressgateway and subsequently to some pod with an envoy sidecar. Meanwhile,
1 second has passed and when envoy checks the token it finds that it has expired.
///////////////////////////////////////////////////
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About int32/uint32: I think we should use int32 since what we are trying to do here is provide a configuration option to users. If someone f.e. wants to use nbf_slack = -60 in order to ensure that tokens are valid, starting one minute into the future after their time of issue - why not?
@@ -9,6 +9,8 @@ | |||
#include "jwt_verify_lib/check_audience.h" | |||
#include "jwt_verify_lib/status.h" | |||
|
|||
#include "absl/time/clock.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any regression or unit/integration tests for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests? Jeeze. What roughnecked thinking. No, seriously. This is my first contribution here and I wasn't actually planning on going through by myself. Anyways, I don't mind at all doing it but need to get set up with a proper build environment to do real work. The coming weekend seems a good time to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah heck, here I am working already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header is not needed any more. please remove it
// existing access token is valid (for another 1 second). It forwards the request to Istio Ingressgateway | ||
// and subsequently to some pod with an envoy sidecar. Meanwhile, 1 second has passed and when envoy checks | ||
// the token it finds that it has expired. | ||
uint32 nbf_slack = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In protobuf: it is
type field_name = field_index;
field_index has to be unique for all fields in a message.
There is not way to default default value in protobuf. default value is determined by the type
|
||
// If the nbf claim does *not* appear in the JWT, then the nbf field is defaulted | ||
// to 0. | ||
if (jwt_->nbf_ - nbf_slack > now) { | ||
if (now < jwt_->nbf_ + nbf_slack) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be
now < nbf_ - nbf_slack
you want the jwt to be ready ahead of nbf time. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflecting on that idea, I think that any use cases for this come down to machines running with incorrect system time and I don't see why we should account for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then
@@ -16,7 +16,7 @@ envoy_cc_library( | |||
":context_lib", | |||
"//source/common/http:utility_lib", | |||
"//source/common/protobuf", | |||
"@com_google_cel_cpp//eval/public:builtin_func_registrar", | |||
#"@com_google_cel_cpp//eval/public:builtin_func_registrar", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind. was accidental. has been changed back.
@@ -30,7 +30,7 @@ envoy_cc_library( | |||
"//source/common/common:minimal_logger_lib", | |||
"//source/common/config:datasource_lib", | |||
"//source/common/protobuf:utility_lib", | |||
"@envoy_api//envoy/extensions/filters/http/jwt_authn/v3:pkg_cc_proto", | |||
"@envoy_api//envoy/extensions/filters/http/jwt_authn/v3:pkg_cc_proto", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove trailing space
@@ -237,7 +244,7 @@ void AuthenticatorImpl::onDestroy() { | |||
|
|||
// Verify with a specific public key. | |||
void AuthenticatorImpl::verifyKey() { | |||
const Status status = ::google::jwt_verify::verifyJwt(*jwt_, *jwks_data_->getJwksObj()); | |||
const Status status = ::google::jwt_verify::verifyJwtWithoutTimeChecking(*jwt_, *jwks_data_->getJwksObj()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to get the latest jwt_verify_lib here:
By change its SHA, and sha256.
the sha256 sum is by: downloading that .tar.gz file, and run sha256sum on the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I get the current .tar.gz file. As I understand, the file
https://github.com/google/jwt_verify_lib/archive/40e2cc938f4bcd059a97dc6c73f59ecfa5a71bac.tar.gz
is outdated. But how do I get the new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new sha is: b5b3b4ed8611b1eea8764845381e60becc7b0b43
the file is https://github.com/google/jwt_verify_lib/archive/b5b3b4ed8611b1eea8764845381e60becc7b0b43.tar.gz
sha256 is: 21b9fd9fb8714cb199a823a4c01cf6665bdd42b62137348707dee51714797dfc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also update the date in the comment field
I am using to build/test locally but it takes ages. Is that correct? See log below ./ci/run_envoy_docker.sh 'ci/do_circle_ci.sh bazel.coverage' |
bazel/repository_locations.bzl
Outdated
@@ -187,7 +187,7 @@ REPOSITORY_LOCATIONS = dict( | |||
strip_prefix = "jwt_verify_lib-40e2cc938f4bcd059a97dc6c73f59ecfa5a71bac", | |||
# 2020-02-11 | |||
urls = ["https://github.com/google/jwt_verify_lib/archive/40e2cc938f4bcd059a97dc6c73f59ecfa5a71bac.tar.gz"], | |||
), | |||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
// existing access token is valid (for another 1 second). It forwards the request to Istio Ingressgateway | ||
// and subsequently to some pod with an envoy sidecar. Meanwhile, 1 second has passed and when envoy checks | ||
// the token it finds that it has expired. | ||
const uint64_t now = absl::ToUnixSeconds(absl::Now()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use absl::Now but use time from timeSource().systemTime()
, this will make test easier by providing mock time.
coverage CI are a bit flaky now so don't worry for now, can you fix format to see if other CI is green? |
|
||
// The two below fields define the amount of slack in seconds that will be used | ||
// when determining if a JWT is valid or has expired. Validity is determined by | ||
// the formula: VALID_FROM("iat") + nbf_slack <= NOW <= VALID_TO("exp") + exp_slack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/iat/nbf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still seeing this:
source/extensions/filters/http/jwt_authn/authenticator.cc:172:59: error: no member named 'nbf_slack' in 'envoy::extensions::filters::http::jwt_authn::v3::JwtProvider'
const uint32_t nbf_slack = jwks_data_->getJwtProvider().nbf_slack();
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
source/extensions/filters/http/jwt_authn/authenticator.cc:173:59: error: no member named 'exp_slack' in 'envoy::extensions::filters::http::jwt_authn::v3::JwtProvider'
const uint32_t exp_slack = jwks_data_->getJwtProvider().exp_slack();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, make sure this modified proto files are compiled. by
bazel build //api/...
for docker run: you need to run this command "bazel.api"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am seeing the error as output from
https://circleci.com/gh/envoyproxy/envoy/338165?utm_campaign=workflow-failed&utm_medium=email&utm_source=notification
@@ -9,6 +9,8 @@ | |||
#include "jwt_verify_lib/check_audience.h" | |||
#include "jwt_verify_lib/status.h" | |||
|
|||
#include "absl/time/clock.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header is not needed any more. please remove it
Bump:
as output from https://circleci.com/gh/envoyproxy/envoy/338165?utm_campaign=workflow-failed&utm_medium=email&utm_source=notification |
Please fix the format, the format check including api_shadow generation, if api_shadow is not reflecting your change, it results the compilation error you see. |
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
How do I fix the format? Is there some guide that I can follow? |
I think you can run "fix_format" command |
Ok. After fix format coverage tests still fail. I think the most relevant part is:
FAILED: //test/coverage:coverage_tests (Summary)` |
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Bump. Come on guys. Let's get this through. This should really be fixed. |
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Please see the following issue in istio:
istio/istio#22750