Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

js-quic integration and Agent migration #525

Merged
merged 40 commits into from
Aug 1, 2023
Merged

Conversation

tegefaulkes
Copy link
Contributor

@tegefaulkes tegefaulkes commented May 23, 2023

Description

This PR Does 3 things.

  1. Migrates agent service to the agnostic RPC system.
  2. integrates @matrixai/js-quic into Polykey.
  3. Removes all old agent and network code. This includes GRPC and network domain

Issues Fixed

Tasks

  • 1. nodes domain needs changes
    • 1. NodeConnection needs to be gutted and replaced with RPCClient and QUICClient usage. Besides this, usage of the NodeConnection is mostly the same.
    • 2. Tests need to be updated
  • 2. Verification logic needs to be transplanted for use with quic.
    • 1. This needs to be tested.
  • 3. Ensure that the proper connection information is provided by the streams from the quic system.
    • 1. This needs to be tested.
  • 4. PolykeyAgent needs to be updated
    • 1. GRPC agent server needs to be replaced with a RPCServer and QUICServer combo
    • 2. Proxy needs to be removed.
    • 3. Configs need to be updated and fully propagated.
  • 5. Agent domain needs to be migrated to using the agnostic RPC code.
    • 1. Tests need to be migrated
  • 6. Old code needs to be removed
    • 1. network domain gutted
    • 2. GRPC domain gutted
    • 3. Remove protobuf? and other package dependencies that are not used anymore.
  • 7. tests!
    • 1. network domain tests need to be removed, any tests still needed should be transplanted.
    • 2. grpc domain tests need to be removed, any tests still needed should be transplanted.
  • [ ] 8. Update relevant handlers with pagination - Differed to stage 2 PR
  • [ ] 9. Update agent handlers to be timed cancellable, implement cancellation. - Differed to stage 2 PR

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@tegefaulkes tegefaulkes self-assigned this May 23, 2023
@ghost
Copy link

ghost commented May 23, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@tegefaulkes
Copy link
Contributor Author

We now have two competing types for Host and Port. The one defined in Polykey and the one defined in @matrixai/quic. Ideally we'd only use one of these to avoid confusion, likely the quic type. But since these types don't actually require any structure until they're checked internally by QUICClient/QUICServer. I don't think they need to be an opaque type at all.

@CMCDragonkai
Copy link
Member

I guess making them an opaque type means you need explicit casts.

But this is ok for now. There is no common type for this stuff. Just do the explicit cast.

@tegefaulkes
Copy link
Contributor Author

I'm doing the cast, but the way it works right now is not clean. The two types are both called Host and Port. There's not a clean way to tell them apart. The types themselves are explicitly not compatible since they're both opaque. So the casting looks like port: port as unknown as QUICPort,.

It's not the end of the world but I think it would be much cleaner if the host an port provided for creating the client or server was just a number or string and the Host and Port types are only used internally after they have been validated.

@CMCDragonkai
Copy link
Member

No we definitely need those types.

In HS we would use a new type constructor for this and share between them. But for now just do it as explicit type cast.

@CMCDragonkai
Copy link
Member

You can import them as aliases.

@tegefaulkes
Copy link
Contributor Author

The NodeConnection used a destroyCallback to trigger cleaning up the NodeConnection in the NodeConnectionManager object map. I think we can replace this with an EventTarget or a promise that resolves once the NodeConnection destroys.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented May 25, 2023

Another thing to note. The connection timeout time is set by the maxIdleTimeout config parameter. This pulls double duty in the quic system, it acts as the time before timing out the connection while connecting AND the idle timeout after a connection has established.

If we want to use a timer to specify the connection timeout. It's possible if we take the time remaining and use that for the maxIdleTimeout. But this will set the idle timeout as well which is not ideal, We'd wan't these specified separately.

So I think we need to race connecting with a connection timeout timer. To do this properly the establishment of a QUICConnection needs to be cancellable.

@tegefaulkes
Copy link
Contributor Author

NodeConnectonManager.pingNode previously just established a proxy connection. But now there is no proxy this just means we establish a NodeConnection now. Do we bother keeping this now? Or is it useful to contextualize connection establishment as a boolean success?

I think i'll keep it but have it wrap the normal connection establish logic. I think it's used pretty extensively for node-graph connection checking.

@tegefaulkes
Copy link
Contributor Author

The QUICConfig type is exported from the config.ts file in quic. Shouldn't this be part of types.ts? Currently it's not exported in any form on the root index.ts so I can only import it with @matrixai/quic/dist/config.

@tegefaulkes tegefaulkes force-pushed the feature-agent_migration branch from ee546b2 to 06459e3 Compare May 25, 2023 03:57
@CMCDragonkai
Copy link
Member

The QUICConfig type is exported from the config.ts file in quic. Shouldn't this be part of types.ts? Currently it's not exported in any form on the root index.ts so I can only import it with @matrixai/quic/dist/config.

I might change this.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented May 25, 2023

QUICClient needs to be cancellable. I don't think it needs to be fully ContextTimed but we do need it to take a AbortSignal so we can abort a connection during a multi-connection or the ICE process.

Likely the same thing needs to be applied to QUICServer.initHolePunch

@CMCDragonkai
Copy link
Member

QUICClient needs to be cancellable. I don't think it needs to be fully ContextTimed but we do need it to take a AbortSignal so we can abort a connection during a multi-connection or the ICE process.

Likely the same thing needs to be applied to QUICServer.initHolePunch

I'm applying some changes to this: MatrixAI/js-quic#26

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented May 26, 2023

I think the client or connection needs an event for when the remote certificates are available. We can check for them after receiving packets for a connection. It's going to be needed to know when TLS has been established and we can work out what the remote NodeId is.

I hadn't implemented something for this yet because I didn't know a clean way of doing it on the quic level. But from the higher level usage of polykey it seems more clear.

In polykey for agent-agent communication we will always be providing certs and using our own validation logic. So the remote certs should always be available and we can wait on that event after creating the client.

Along with this, I think we still need to implement our custom verification logic. I don't think we currently have a method for setting this.

@tegefaulkes
Copy link
Contributor Author

We need to support a function called establishMultiConnection. It takes an array of NodeIds and addresses and attempts to connect to every address looking for an any node in the list. It would return a list of established connections made to the nodes.

Previously this worked on the proxy level. So the object map didn't need to support taking multiple nodeIds when establishing a connection. This will make the locking a little finicky.

SO what I need to do is update NodeConnectionManger to support taking multiple possible nodeIds. It then needs to establish a connection.

I think I need two forms of getConnection now. 1st for doing discovery for a single node and a 2nd for connecting to a single address looking for multiple nodes. I can't supply multiple nodeIds and no address, It doesn't really work for that, what node do we use for discovery? Do we attempt discovery on every node? Let's split the work here, we can share common functionality.

So we need the NodeConnectionManager.getConnection() which will try to establish a connection with a single nodeId. This will be used if we don't know the address of the node we want. It will preform a Kademlia search for it but it can also act like a ping if we provide the Id and address to verify if that address is that node.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented May 26, 2023

I think the client or connection needs an event for when the remote certificates are available. We can check for them after receiving packets for a connection. It's going to be needed to know when TLS has been established and we can work out what the remote NodeId is.

I hadn't implemented something for this yet because I didn't know a clean way of doing it on the quic level. But from the higher level usage of polykey it seems more clear.

In polykey for agent-agent communication we will always be providing certs and using our own validation logic. So the remote certs should always be available and we can wait on that event after creating the client.

Along with this, I think we still need to implement our custom verification logic. I don't think we currently have a method for setting this.

I've updated the code for TLS in MatrixAI/js-quic#26

No custom verification function yet, but that should be part of it too.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented May 26, 2023

It would be usefull if the QUICServer had a stream event that wraps the QUICConnection while handling registering and de-registering handlers for each connection. Just to act as a central point to register a new stream handler.

Otherwise we need to set up a handler for connection events that sets up a handler for a new stream event and all the logic for cleaning that up once the connection has ended. It's better to have this logic internal to QUICServer.

@CMCDragonkai
Copy link
Member

It would be usefull if the QUICServer had a stream event that wraps the QUICConnection while handling registering and de-registering handlers for each connection. Just to act as a central point to register a new stream handler.

Otherwise we need to set up a handler for connection events that sets up a handler for a new stream event and all the logic for cleaning that up once the connection has ended. It's better to have this logic internal to QUICServer.

Technically a stream is specific to a QUICConnection. If we were to also emit this event on QUICServer, we would need to emit it with the connection information as well. Both should be part of it. [QUICConnection, QUICStream].

I think it's possible to listen for the stream events on the QUICConnection and remit the event on the QUICServer too.

This could apply to QUICClient and so on.

@tegefaulkes
Copy link
Contributor Author

That's what I was thinking, yeah.

@CMCDragonkai
Copy link
Member

I think we will need this https://napi.rs/docs/concepts/threadsafe-function inside js-quic, and the function will need to be synchronous, so that rust will just call the JS function when it is doing the work. What was the configuration parameter we need?

@tegefaulkes
Copy link
Contributor Author

Here is the first blush high level diagram for what needs to be done.

image

@tegefaulkes
Copy link
Contributor Author

I've cherry-picked the swc/jest changes to test it in this PR. I'll also re-base on staging.

@CMCDragonkai
Copy link
Member

I forgot that we had actually entirely removed jose. So that jose hack can be deleted from jest.config.js.

@tegefaulkes
Copy link
Contributor Author

re-based.

@tegefaulkes
Copy link
Contributor Author

Cherry-picking the #455 fix to this PR.

@tegefaulkes
Copy link
Contributor Author

QUICSocket isn't properly handling destroyed connections if they were destroyed by the application. It seems to be calling conn.recv after the conn has been destroyed by the application. The conn should be waiting for the connection to fully finish before conn.destroy resolves. So i'm not sure why this is happening.

In any case, the current work on js-quic should fix this. but it's something to keep an eye on.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jul 31, 2023

I'm not going to have time to be able to review the entire nodes domain as you've integrated it. So please look into the remaining comments that are unresolved and track them in new issues or in phase 2 or if they are quickfix right now, make sure you hit the load more comments, it sometimes hides it. @tegefaulkes

image

My plan is to get towards the 6th (testing the new QUIC, RPC and MDNS on testnet) and 7th (Polykey Enterprise private networks) testnet deployments. That means merging this, and then helping you on the CLI extraction in particular in terms of building/packaging tooling and ESM migration. You'll then need to address all the dangling issues in phase 2.

The remaining work for me on PKE is just the overall reactive architecture, and that will be hand off to @amydevs to focus on.


Don't forget to squash some of your recent fix commits too.

Side note, 7th deployment will require the RPC factored out to js-rpc so it can be re-used in PKE.

@tegefaulkes
Copy link
Contributor Author

All the review comments have been addressed. I'm moving on to cleaning and merging now.

@tegefaulkes tegefaulkes force-pushed the feature-agent_migration branch from ca23729 to 703ec5d Compare August 1, 2023 05:24
@tegefaulkes tegefaulkes merged commit fc82a3d into staging Aug 1, 2023
@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Aug 1, 2023

Merged!

I'm moving on to CLI migration now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
2 participants