Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Pace requests to token server for new auth tokens #55

Merged
merged 1 commit into from
Apr 30, 2020

Conversation

jorge-ibm
Copy link
Contributor

This PR updates the design for token refresh by decoupling the refresh logic from active requests to be consistent with the other cores.

ref: https://github.ibm.com/arf/planning-sdk-squad/issues/1550

}

var cp4dMuOne sync.Mutex
Copy link
Contributor Author

@jorge-ibm jorge-ibm Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to use locks instead of goroutines for achieving the synchronous design due to the simplicity. I created two mutexes because I couldn't find a straight forward answer on whether or not I can use the same lock in two places at once, though I don't think the two locks will ever be used at the same time the way the code is setup

return "", err
} else if authenticator.tokenData != nil && authenticator.needsRefresh() {
// If refresh needed, kick off a go routine in the background to get a new token
ch := make(chan error)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a goroutine to achieve the "multi-threading" we want. The select case below it should ensure that no other routine sits around and waits for the newly spawned routine to finish

}

// If the refresh time isn't set, then set it
if authenticator.tokenRefreshTime == 0 && &authenticator.tokenData.ExpiresIn != nil && &authenticator.tokenData.Expiration != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkistler @padamstx - now that I'm looking at this again, should I always be setting the correct refresh time when we request a new token? Currently, it will only be set when you request the first token, and when you refresh the token where we push it forward 1 min.. I think this will result in more token requests than we'd like

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ... you should be updating tokenRefreshTime whenever you get a new token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that as I created the PR 🙂

Copy link
Contributor

@mkistler mkistler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start but I think there are some issues to be worked out.

authenticator.tokenData, err = newCp4dTokenData(tokenResponse)
if err != nil {
return "", err
} else if authenticator.tokenData != nil && authenticator.needsRefresh() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to test authenticator.tokenData != nil here -- it should never be nil here because of the check at line 157.

if err != nil {
return "", err
}
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to update tokenRefreshTime here? Otherwise you'll continually refresh the token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think I do need to re-calculate the refresh time when we get a new token. Not sure why I kept that out originally.. I think I was following the java core too much

Copy link
Member

@padamstx padamstx Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In java, we don't pre-compute the refresh time... the Token object's "needsRefresh()" method will compute it on the fly if needed, and then determine whether or not the token needs to be refreshed.

return nil
}

err := authenticator.getTokenData()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is very dangerous to perform a network call while holding a mutex. We should find some other solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this further ... I suppose the other languages must do something similar. Hmmm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Java core, we do this inside the TokenRequestBasedAuthenticator's getToken() method:

  • if we have no token at all, or the current token has expired, then we invoke a blocking request (synchronized) to the token server
  • otherwise we have a valid token and we use that as-is, but if the token is within the refresh window (last 20% of lifetime I think), then we ALSO start a refresh in a separate thread.

Not sure what else we can do to improve on that

Copy link
Contributor Author

@jorge-ibm jorge-ibm Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkistler - I think the only other thing I can think of to use in place of the mutex is a go channel blocking go routine.. Though I remember reading that under hood it still uses a mutex. I'll try to find that text again

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is ... what happens if the one token request we make "gets lost" -- not unheard of for a network request. Does the client-side time out after a while and simply fail the request ... allowing another to try the request again? If that's the case then this is probably fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatever happens in the Java core would also happen here, so yeah I guess a timeout would occur. If there is some other more fault-tolerant pattern we should be using in the Node, Java, Go cores (at a minimum) to avoid this, then we'll need to put our thinking caps on and come up with a better design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I was being overly cautious. The timeout will cover us so this should be fine.

@jorge-ibm jorge-ibm force-pushed the token-exchange-fix branch 4 times, most recently from 399b740 to fa2f137 Compare April 28, 2020 17:51
@jorge-ibm
Copy link
Contributor Author

@mkistler @padamstx - I've pushed an amended commit that adds the missing calculation of the refreshTime everytime we get a new token. I've also moved back the property refreshTime to be a part of the iamTokenData and cp4dTokenData structs, respectfully.

As far as the concerns over using locks for network calls, it looks like by default we set a client side timeout of 30 seconds. I've added two new tests to simulate a client side timeout, TestIamGetTokenTimeoutError and a test to simulate a 504 from the server, TestIamGetTokenServerError.

Note: the client timeout doesn't cover the time spent sending a request rather the time spent reading/receiving.

@jorge-ibm jorge-ibm requested a review from mkistler April 28, 2020 17:51
Copy link
Contributor

@mkistler mkistler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍

I left some comments/suggestions on stylistic changes that I trust you will handle as appropriate.

@@ -63,6 +64,9 @@ type CloudPakForDataAuthenticator struct {
tokenData *cp4dTokenData
}

var cp4dMuOne sync.Mutex
var cp4dMuTwo sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to use meaningful names for these mutexes -- to be clear on what they lock/control.

I recommend:

  • cp4dMuOne -> cp4dRequestTokenMutex
  • cp4dMuTwo -> cp4dNeedsRefreshMutex

@@ -77,10 +78,12 @@ type IamAuthenticator struct {
tokenData *iamTokenData
}

var iamMuOne sync.Mutex
var iamMuTwo sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too ... please use meaningful names for these mutexes.

}
}

// return an error if the access token is not valid or was not fetched
if authenticator.tokenData == nil || authenticator.tokenData.AccessToken == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's curious that here you check authenticator.tokenData.AccessToken == "" but above you check !authenticator.tokenData.isTokenValid(). Why the difference? Can't we make them both the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an oversight on my end. I'll go ahead and add that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @mkistler I went ahead and tried to make this change, though it's important to noteisTokenValid also checks to see if the token is expired. I think it's a good extra check to have, though the issue is that for cp4d's unit tests, the expiration time is calculated using some jwt token, and not by a mock value from actual token response. The token used in those tests is expired, so all the tests using it would fail.. Getting the unit tests to work would require either a new token that would eventually also expire, changing the GetCurrentTime method to accept a mock clock, or changing how we set and calculate the expired time for a token. I just went ahead and reverted this change.

return err
}

return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't the above sequence of 6 lines be expressed simply as:

    return authenticator.getTokenData()

Copy link
Member

@padamstx padamstx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed a few things with Jorge on slack. LGTM!

@jorge-ibm jorge-ibm merged commit 578399b into master Apr 30, 2020
ibm-devx-automation pushed a commit that referenced this pull request Apr 30, 2020
## [3.3.1](v3.3.0...v3.3.1) (2020-04-30)

### Bug Fixes

* Pace requests to token server for new auth tokens ([#55](#55)) ([578399b](578399b))
@ibm-devx-automation
Copy link

🎉 This PR is included in version 3.3.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@padamstx padamstx deleted the token-exchange-fix branch August 11, 2020 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants