Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Retrieve token from server in GCS client [4/n] #35014

Closed
wants to merge 4 commits into from

Conversation

vitsai
Copy link
Contributor

@vitsai vitsai commented May 3, 2023

Retrieve the token from the GCS server in the GCS client while connecting, to attach to metadata in requests.

Previous PR (GCS server): #36535
Next PR (auth): #36073

Why are these changes needed?

Related issue number

#34763

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@fishbone fishbone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up the code! I think there are two things here:

  • The token shouldn't be generated in the first step. We should connect to the store first and then try to fetch the token. If it's not there, we generate and store it.
  • The token should be in the RPC layer, not the application layer. It means we should add it to client context in client call. And in grpc server whenever we got a call, we should check the token first and if it's good, we shall proceed by calling the callback otherwise just return with failure.

I think this PR could just focus on the first one.

src/ray/gcs/gcs_server/gcs_server.cc Outdated Show resolved Hide resolved
src/ray/gcs/gcs_server/gcs_server.cc Outdated Show resolved Hide resolved
@vitsai vitsai force-pushed the gcs-version branch 2 times, most recently from e70a53c to 15c95c8 Compare May 8, 2023 21:46
@fishbone fishbone added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label May 8, 2023
@scv119 scv119 self-assigned this May 9, 2023
@rkooo567
Copy link
Contributor

+1 in just introducing the cluster id.

@jjyao
Copy link
Collaborator

jjyao commented May 10, 2023

+1 for cluster_id concept

@vitsai vitsai force-pushed the gcs-version branch 3 times, most recently from 5d3b1b7 to 385d4c8 Compare May 11, 2023 07:42
@vitsai
Copy link
Contributor Author

vitsai commented May 11, 2023

  1. I've changed it to a cluster-id concept. Based on how the change works now, it would be possible for each ClientCallManager to have an ID corresponding just to the set of services it subscribes to (e.g. ObjectManager and GCS have different IDs that their clients use to authenticate with them). The name "cluster-id" implies something stronger, that a single ClusterID should be generated by GCS and propagated to all the services to use as the same token.

It doesn't matter for now since the change doesn't plumb through the Cluster-ID propagation to other services, but in the long run it would probably be good for each service to have some sort of verification/auth with its client. But this decision would affect whether or not we need a shared_future for ClientCallManager.

  1. Right now, any client without an authentication token can request to RegisterClient() from the GCS. It should ideally be a handshake where the client provides some useful information about its identity, but I think that can be a follow-up.

@vitsai vitsai force-pushed the gcs-version branch 2 times, most recently from 596091c to a1abc5c Compare June 22, 2023 20:19
@vitsai vitsai removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jun 22, 2023
@vitsai vitsai changed the title [core] Generate and send GCS server token [4/n] [core] Retrieve token from server in GCS client [4/n] Jun 22, 2023
@vitsai vitsai force-pushed the gcs-version-server branch 3 times, most recently from b04fe8f to 246cbc5 Compare June 28, 2023 18:24
@vitsai vitsai force-pushed the gcs-version-server branch from b7bdbe1 to bff887d Compare June 29, 2023 23:31
vitsai added 2 commits June 29, 2023 21:32
[core] Add ClusterID token to GRPC server [1/n] (ray-project#36517)

First of a stack of changes to plumb through token exchange between GCS client and server. This adds a ClusterID token that can be passed to a GRPC server, which then initializes each component GRPC service with the token by passing to the ServerCallFactory objects when they are set up. When the factories create ServerCall objects for the GRPC service completion queue, this token is also passed to the ServerCall to check against inbound request metadata. The actual authentication check does not take place in this PR.

Note: This change also minorly cleans up some code in GCS server (changes a string check to use an enum).

Next change (client-side analogue): ray-project#36526

[core] Generate GCS server token

Signed-off-by: vitsai <victoria@anyscale.com>

Add client-side logic for setting cluster ID.

Signed-off-by: vitsai <victoria@anyscale.com>

bug fixes

Signed-off-by: vitsai <victoria@anyscale.com>

comments

Signed-off-by: vitsai <victoria@anyscale.com>

bug workaround

Signed-off-by: vitsai <victoria@anyscale.com>

Fix windows build

Signed-off-by: vitsai <victoria@anyscale.com>

fix bug

Signed-off-by: vitsai <victoria@anyscale.com>

remove auth stuff from this pr

Signed-off-by: vitsai <victoria@anyscale.com>

fix mock build

Signed-off-by: vitsai <victoria@anyscale.com>

comments

Signed-off-by: vitsai <victoria@anyscale.com>

remove future

Signed-off-by: vitsai <victoria@anyscale.com>

Remove top-level changes

Signed-off-by: vitsai <victoria@anyscale.com>

comments

Signed-off-by: vitsai <victoria@anyscale.com>

Peel back everything that's not grpc-layer changes

Signed-off-by: vitsai <victoria@anyscale.com>

Change atomic to mutex

Signed-off-by: vitsai <victoria@anyscale.com>

Fix alignment of SafeClusterID

Signed-off-by: vitsai <victoria@anyscale.com>

comments

Signed-off-by: vitsai <victoria@anyscale.com>

Add back everything in GCS server except RPC definition

Signed-off-by: vitsai <victoria@anyscale.com>

fix bug

Signed-off-by: vitsai <victoria@anyscale.com>

comments

Signed-off-by: vitsai <victoria@anyscale.com>

comments

Signed-off-by: vitsai <victoria@anyscale.com>

Add client-side stuff

Signed-off-by: vitsai <victoria@anyscale.com>

hack workaround to simulate async direct dispatch

love when things hang

Signed-off-by: vitsai <victoria@anyscale.com>
Signed-off-by: vitsai <victoria@anyscale.com>
@vitsai vitsai marked this pull request as draft June 30, 2023 06:14
@vitsai vitsai marked this pull request as ready for review June 30, 2023 06:16
@fishbone fishbone deleted the branch ray-project:gcs-version-server June 30, 2023 18:02
fishbone pushed a commit that referenced this pull request Jun 30, 2023
Add support for a GetClusterId RPC call in the GCS server that clients can use to obtain the cluster ID. In particular, GCS server will retrieve the cluster id from the persistent store if it exists, or otherwise generate a new one and store it.

Previous PR (GRPC client): #36526
Next PR (GCS client): #35014

Part 3 of breaking down #35014 into more digestible parts.
@fishbone fishbone closed this Jun 30, 2023
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
This change adds a Cluster ID to GRPC client (client call, client call manager), and attaches it to the metadata of each ClientCall provided it is non-nil.

Previous PR (GRPC server): ray-project#36517
Next PR (GCS server): ray-project#36535

Part 2 of breaking down ray-project#35014 into more digestible parts.

Related issue number
ray-project#34763

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
Add support for a GetClusterId RPC call in the GCS server that clients can use to obtain the cluster ID. In particular, GCS server will retrieve the cluster id from the persistent store if it exists, or otherwise generate a new one and store it.

Previous PR (GRPC client): ray-project#36526
Next PR (GCS client): ray-project#35014

Part 3 of breaking down ray-project#35014 into more digestible parts.

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants