Extracting Node Connection Management out of `NodeManager` to `NodeConnectionManager` #310

joshuakarp · 2021-12-22T01:59:49Z

Description

The refactoring effort to make our nodes domain more robust.

                              ┌──────────┐
                              │ Sigchain │
                              └────▲─────┘
                                   │
                                   │
                                   │
                            ┌──────┴──────┐
                            │ NodeManager ├──────────────────┐
                            └──────┬──────┘                  │
                                   │                         │
                                   │                         │
                                   │                         │
                       ┌───────────▼───────────┐             │
┌──────────────┐       │                       │             │
│ ForwardProxy ◄───────┤ NodeConnectionManager │             │
└──────────────┘       │                       │       ┌─────▼─────┐
                       │   ┌──────────────┐    ├───────► NodeGraph │
┌──────────────┐       │   │NodeConnection│    │       └───────────┘
│ ReverseProxy ◄───────┤   └──────────────┘    │
└──────────────┘       │                       │
                       └─────▲─────▲────▲──────┘
                             │     │    │
                             │     │    │
                  ┌──────────┘     │    └─────────────┐
                  │                │                  │
             ┌────┴────┐    ┌──────┴─────┐  ┌─────────┴──────────┐
             │Discovery│    │VaultManager│  │NotificationsManager│
             └─────────┘    └────────────┘  └────────────────────┘

Issues Fixed

Tasks

NodeConnectionManager:

draft API plan:

class NodeConnectionManager {
   // creates/retrieves a connection, and uses to perform passed f
   public withConnection<T>(targetNodeId, f: (conn: NodeConnection => Promise<T>)): Promise<T>;
   // simply creates a connection, but doesn't use it for anything yet
   public createConnection(targetNodeId): Promise<void>;
   public destroyConnection(targetNodeId): Promise<void>;
   // Migrated from NodeManager:
   // helper function to create and set a NodeConnection (called twice in createConnection)
   protected establishNodeConnection(targetNodeId, lock): Promise<void>;
   public findNode(targetNodeId): Promise<NodeAddress>;
   public openConnection(egressHost, egressPort, timer): Promise<void>;
   // Migrated from NodeGraph:
   public getClosestLocalNodes(targetNodeId): Promise<Array<NodeData>>;
   public getClosestGlobalNodes(targetNodeId): Promise<NodeAddress | undefined>;
}

NodeConnection:
- draft API plan:
```
class NodeConnection {
  // publicly exposed client, such that NodeConnectionManager.withConnection can do calls
  // such as conn.client.doSomething()
  public client: GRPCClientAgent;
  // does everything that NodeConnection.start currently does
  public create/start();
  // removes itself from the reference it has to NodeConnectionManager's connection map
  public destroy()/stop();
  // Retained from current NodeConnection:
  public getRootCertChain();
  public getExpectedPublicKey();
}
```
- - Determine most appropriate async-init pattern Extracting Node Connection Management out of NodeManager to NodeConnectionManager #310 (comment)
- - Move all internal function implementations to use withConnection in relevant domain (and remove from NodeConnection after moving):
  - - getClosestNodes: to NodeConnectionManager.getClosestGlobalNodes
  - - sendHolePunchMessage: to NodeConnectionManager?
  - - sendNotification: to NotificationsManager.sendNotification
  - - getChainData: to NodeManager.requestChainData
  - - claimNode: to NodeManager.claimNode
  - - scanVaults: to NodeManager/VaultManager? Will need rebase on Introducing vault sharing and the restrictions of pulling and cloning with vault permissions #266 when merged to bring this in
- Node connection needs to properly close in all cases where the client connection fails. if the client still exists when the connection should be dead then we have a resource leak.
NodeGraph:
- draft API plan:
```
class NodeGraph {
  // no changes to current API, except the function removals as listed below
}
```
- - Determine most appropriate async-init pattern Extracting Node Connection Management out of NodeManager to NodeConnectionManager #310 (comment)
- - Remove the following:
  - - getClosestLocalNodes
  - - getClosestGlobalNodes
  - - syncNodeGraph? This currently utilises connections, so will potentially need to be moved to NodeConnectionManager.

NodeManager:

draft API plan:

class NodeManager {
  // Retaining the following existing functions:
  public getNodeId();
  public knowsNode(targetNodeId);
  public pingNode(targetNodeId);
  // or could even move this to KeyManager now - gets the target node's public key
  public getPublicKey(targetNodeId); 
  // queries the encapsulated sigchain
  public getChainData();
  // queries a target node's sigchain
  public requestChainData(targetNodeId);
  public claimNode(targetNodeId);
}

- Determine most appropriate async-init pattern Extracting Node Connection Management out of NodeManager to NodeConnectionManager #310 (comment)
- Investigate removing the forwarding calls to NodeGraph Extracting Node Connection Management out of NodeManager to NodeConnectionManager #310 (comment)

confirm issues have been addressed
things to double check.
- - check that @ready errors are correct in NodeGraph, NodeManager and NodeConnectionManager
- - review NodeManager. NodeGraph, NodeConnectionManager and NodeConnection tests and update where needed.
- - All node errors have appropriate sysexit numbers and descriptions.
- ~~review based on this comment. Review js-id usage for NodeId, VaultId, NotificationId, PermId, ClaimId, Gestalts and GenericIdTypes.ts #299 (comment)~~ addressed by Change NodeId from encoded string to an opaque alias of Id which is a typed array from js-id #318.
- - Update js-id version to 3.0.0. Introducing vault sharing and the restrictions of pulling and cloning with vault permissions #266 (comment)
- - @joshuakarp Review all nodes tests for any missing test cases
- @joshuakarp Review connection state handling + diagramming Extracting Node Connection Management out of NodeManager to NodeConnectionManager #310 (comment)
- - @tegefaulkes Remove any dependency cycles from the nodes domain. Extracting Node Connection Management out of NodeManager to NodeConnectionManager #310 (comment). This may apply to other domains, but we decided to leave that for the scope of other PRs. Related: Change NodeId from encoded string to an opaque alias of Id which is a typed array from js-id #318 (comment)

Final checklist

Once this is merged, it's time to do Testnet Deployment! #194

joshuakarp · 2021-12-22T02:46:45Z

From yesterday's discussions, the plan for the NodeConnectionManager locking is as follows:

class NodeConnectionManager {
  public withConnection() {
    // if conn exists, lock = its read lock
    // otherwise, lock = a new write lock
    createConnection(lock)
    // release lock
  }

  public createConnection(lock?) {
    if (lock == null) newLock = new write lock
    // do connection creation
    if (lock == null) release newLock
  }
}

As a result of this, the expectation is that we'd only really pass in our own lock when you know what you're doing, and that you've already acquired the lock that you're passing in. That seems like somewhat ambiguous behaviour to me. I feel like all of this locking behaviour shouldn't be publicly exposed at all. I know that it's an optional parameter, but we don't want other domains to need to think about this lock parameter at all.

Alternatively, we could do the following:

class NodeConnectionManager {
  public withConnection() {
    createConnection()
    // now we know a connection must exist
    // so simply get the connAndLock, acquire its read lock and do the operation
  }

  public createConnection() {
    // this should basically be identical behaviour to what getConnectionToNode currently is
    // however, it shouldn't return the NodeConnection: 
    get connAndLock
    if connAndLock != null
      if conn != null: return
      acquire write lock
      if conn != null after acquiring: return
      otherwise create connection
      release write lock
    if connAndLock == null:
      create lock and acquire write
      set lock in connections map
      create conn
      set conn in connections map
      release write lock
  }
}

joshuakarp · 2021-12-22T05:43:54Z

With the current committed usage of RWLock, I've been able to verify concurrent connection behaviour:

  public async withConnection<T>(
    targetNodeId,
    f: (conn: NodeConnection) => Promise<T>,
  ): Promise<T> {
    // Ensure a connection exists
    await this.createConnection(targetNodeId);
    const connAndLock = this.connections.get(targetNodeId);
    // Unexpected behvaiour: between call to create connection and retrieval, 
    // the conn or conn entry was deleted
    // if (connAndLock == null || connAndLock.connection == null) {
    //   throw new Error();
    // }
    // The above is commented out such that we don't need to create a NodeConnection - just testing the locking mechanisms
    // Acquire the read lock (allowing concurrent connection usage)
    return await connAndLock!.lock.read<T>(async () => {
      return await f(connAndLock!.connection!);
    });
  };

  // these are used by `withConnection`
  // reusing terms from network domain:
  // connConnectTime - timeout for opening the connection
  // connTimeoutTime  - TTL for the connection itself
  public async createConnection(targetNodeId: NodeId): Promise<void> {
    let connection: NodeConnection | undefined;
    let lock: utils.RWLock;
    let connAndLock = this.connections.get(targetNodeId);
    if (connAndLock != null) {
      ({ connection, lock } = connAndLock);
      // Connection already exists, so return
      if (connection != null) return;
      // Acquire the write (creation) lock
      await lock.write(async () => {
        // Once lock is released, check again if the conn now exists
        connAndLock = this.connections.get(targetNodeId);
        if (connAndLock != null && connAndLock.connection != null) return;
        // TODO: create the connection and set in map
        console.log('existing lock: created connection');
      });
    } else {
      lock = new utils.RWLock();
      connAndLock = { lock };
      this.connections.set(targetNodeId, connAndLock);
      await lock.write(async () => {
        // TODO: create the connection and set in map
        console.log('no existing entry: created connection');
      });
    }
  }

With the following test.ts:

import { NodeConnectionManager } from './src/nodes';
import type { NodeId } from './src/nodes/types';
import { sleep } from './src/utils';

async function main() {
  const connManager = new NodeConnectionManager();
  await connManager.createConnection('a' as NodeId);
  await Promise.race([
    connManager.withConnection('a' as NodeId, async () => {
      console.log('1.0');
      await sleep(1000);
      console.log('1.1');
      await sleep(1000);
      console.log('1.2');
      await sleep(1000);
      console.log('1.3');
      await sleep(1000);
      console.log('1.4');
    }),
    connManager.withConnection('a' as NodeId, async () => {
      console.log('2.0');
      await sleep(1000);
      console.log('2.1');
      await sleep(1000);
      console.log('2.2');
      await sleep(1000);
      console.log('2.3');
      await sleep(1000);
      console.log('2.4');
    }),
  ]);
}

main();

We get:

[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm run ts-node -- test-node.ts 

> @matrixai/polykey@1.0.0 ts-node /home/josh/Documents/MatrixAI/js-polykey
> ts-node --require tsconfig-paths/register "test-node.ts"

no existing entry: created connection
existing lock: created connection
existing lock: created connection
1.0
2.0
1.1
2.1
1.2
2.2
1.3
2.3
1.4
2.4

Changing the return await connAndLock!.lock.read<T> to a write lock, return await connAndLock!.lock.write<T> in withConnection, we get:

[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm run ts-node -- test-node.ts 

> @matrixai/polykey@1.0.0 ts-node /home/josh/Documents/MatrixAI/js-polykey
> ts-node --require tsconfig-paths/register "test-node.ts"

no existing entry: created connection
existing lock: created connection
existing lock: created connection
1.0
1.1
1.2
1.3
1.4
2.0
2.1
2.2
2.3
2.4

This is expected, as the write lock should block concurrent usage.

joshuakarp · 2021-12-23T04:27:16Z

There needs to be some further discussion concerning the async-init patterns we use for the nodes domain.

Original comment from @CMCDragonkai about this here #225 (comment):

NodeGraph - CDSS - because this is now shared between NodeConnectionManager and NodeManager, this has to be created outside of NodeManager as a singleton and shared - created first

NodeConnectionManager - SS or CDSS or CD - created second

NodeManager - SS or CDSS or CD - created third

NodeGraph (currently CDSS):

should be CDSS, as it has persistent data storage that should be initialised on create/start, and completely wiped on destroy
ultimately just a data structure now, akin to ACL/GestaltGraph (which also use CDSS)

NodeConnection (currently CDSS):

potentially this only needs to be a CD pattern - this would also follow the usage of the CD pattern with GRPCClientAgent
I don't foresee any problems with migrating the current start into createNodeConnection instead. start currently:
- creates the GRPCClientAgent
- sends hole punching message to the target node (using the seed connections)
this could also relate to the choices made in Integrate async-init pattern to network/Connection.ts #293

NodeConnectionManager:

most likely dependent on what NodeConnection is - if it remains as CDSS, then this will also likely be CDSS (stops the NodeConnections on stop, and destroys on destroy), otherwise this could also be CD

NodeManager (currently CDSS):

given that NodeGraph will no longer be an encapsulated dependency on NodeManager, there's also some flexibility with what this will need to be. Previously, we had to propagate the NodeGraph.destroy into the NodeManager.destroy, but this will no longer be the case

Also worthwhile to recall this comment from me regarding a problem with ordering on PolykeyAgent startup #289 (comment) and the discussion of using the start-stop pattern as a way to mitigate this #225 (comment).

joshuakarp · 2021-12-23T04:54:31Z

We'll also need to slightly rework the usage of seed node connections.

At the moment, we have a function NodeManager.getConnectionsToSeedNodes that establishes/retrieves the connections to each of our seed nodes. This is used in a few places:

PolykeyAgent.start: for initially establishing the connections. This is currently separated from the regular NodeManager.start as a result of some ordering issues discussed here Supporting trusted seed nodes (nodes domain refactoring, static sources for seed nodes) #289 (comment). Ultimately, this problem should be aimed to be resolved in this nodes refactoring effort.
NodeConnection.start: the seed node connections are passed in every createNodeConnection call. This is currently done to allow hole punching messages to be sent to the seed nodes, and then relayed to the target node to allow the connection to be established. Calls NodeConnection.sendHolePunchMessage.
NodeGraph.syncNodeGraph: for initially populating a new NodeGraph with some initial nodes closest to it. We ask the seed nodes for the closest nodes to our newly created node. Calls NodeConnection.getClosestNodes.

This shouldn't require too much of a rework. Initially thinking about this, we can instead statically store the seed node IDs in NodeConnectionManager, and just call withConnection on each of these IDs with the appropriate functionality we require.

joshuakarp · 2021-12-23T05:05:58Z

There's also the problem of code repetition when using withConnection. For example, we currently have a getClosestNodes function in NodeConnection. This is called in 2 places:

NodeGraph.syncNodeGraph
NodeGraph.getClosestGlobalNodes

Both places connect to a remote node and retrieve a set of closest nodes (i.e. the same functionality).

We'll no longer be wrapping this common functionality within a NodeConnection function. Instead, we could wrap it as a common function in some other class like so:

class NodeConnectionManager {
  public requestClosestNodes(targetNodeId) {
    return this.withConnection(targetNodeId, async () => {
      // functionality to get the closest nodes
    });
  }
  
  public syncNodeGraph() {
    // ...
    this.requestClosestNodes(this.keyManager.getNodeId());
  }

  public getClosestGlobalNodes(targetNodeId) {
    // ...
    this.requestClosestNodes(targetNodeId);
  }
}

Although I am wary that this starts to lean towards the issues that we're currently trying to solve (where NodeConnection has multiple functions that simply construct GRPC messages and relay calls to the GRPC client). Here, we seem to just be pulling this problem out of NodeConnection and into some other domain.

joshuakarp · 2021-12-23T05:21:16Z

Given that NodeGraph will no longer be treated as an encapsulated dependency of NodeManager, we also have the opportunity to remove the forwarding calls in NodeManager that simply query the NodeGraph. For example:

  public async getNode(targetNodeId: NodeId): Promise<NodeAddress | undefined> {
    return await this.nodeGraph.getNode(targetNodeId);
  }

We have similar functions for setting a node, getting all the node buckets, getting the closest global nodes, and refreshing the buckets.

This would mean that we would instead inject NodeGraph into the domains that it would need to be used.

So 2 alternatives here:

We retain the forwarding functions in NodeManager, and keep it as is where we inject NodeManager everywhere it's necessary to query/add to the NodeGraph
We remove the forwarding functions, and instead inject NodeGraph everywhere we need it. This greatly simplifies the API of NodeManager, but means we'd need to expose NodeGraph in lots of different places

CMCDragonkai · 2021-12-28T01:20:26Z

Regarding with contexts and general resource management patterns.

In OOP we have RAII, and this is being done through the js-async-init. But it's fairly a manual process with createX, destroy, start, stop being used. In our overall app architecture this is put together as matroska dolls and nesting the create, start, stop, destroy.

In imperative programming, we have the try finally pattern that interacts with RAII. Basically create the resource initially, then in finally destroy the resource. This works well with RAII. And it's also really flexible, since it can be used pretty much anywhere.

In functional programming we have additional pattern: with contexts or known as the bracketing pattern. This is a higher order function that takes a callback and injects it the resource, and when the callback is done, it will release/destroy the resource. We have used this in a number of places as it's a nicer interface for interacting with resources and it automates the destruction as it scopes the resource lifecycle to the function callstack/scope. We have used this in a number of places and intend to use it for things like withConn, withVault, withLock... etc.

One issue with using brackets, is that JS has 2 kinds of async functions (assume that such resource management always takes place asynchronously), ones that return a promise, and ones that return an async generator. And this would require 2 different variants of the bracket function like withConnF and withConnG. Additionally nesting brakets/with contexts can lead to a callback mountain which isn't as nice, in which case a generalised with combinator or promise-based chaining system should be used.

For example a general with combinator would look like this:

// similar to python's with Context() as a, Context() as b:
withF([resource, resource], (a, b) => {

});
// need a async generator variant
withG([resource, resource], async function* () {

});

To do the above, the resources must all have a similar expected interface. Perhaps something like:

// the T may be optional, in that it may not actually be available
// locking resource doesn't give any lock resource to the callback
type Resource<T> = () => Promise<[T, () => Promise<void>]>

This would allow the with combinator to always work any resource such as those exposed via imperative APIs.

let count = 0;
withF([
  async () => {
    ++count;
    return [count, async () => { --count; }]
  }
], (c) => {
  console.log(c);
})

The with combinator signature might be complicated, since you have to accumulate the types of resources that will be acquired, and expect a function to be passed in that takes those types. This would be similar to the signature for Promise.all since it also has to take an array of arbitrary types and then spread it into the then callback.

A promise chaining system would be similar to how promises can chain stuff:

resourceA().with(resourceB).with(resourceC).then((a, b, c) => {
});

But I can't really see this being much better than what is discussed above.

@tegefaulkes @joshuakarp relevant to our discussions about the right API for nodes and vaults.

CMCDragonkai · 2021-12-29T07:00:19Z

If withF and withG are generalised utility functions. Then we wouldn't expect each class/domain to expose it. Instead each domain would need to expose a "resource API" as per the example given:

type Resource<T> = () => Promise<[T, () => Promise<void>]>

So imagine for ACL, instead of transaction, you expose the ability to lock the ACL.

class ACL {
  public function acquireLock() {
    await this.lock.acquire();
    return [undefined, () => {
      release();
    }];
  }
}

Then external users could then use withF([acl.acquireLock()], ...);.

tegefaulkes · 2022-01-06T05:20:33Z

Right now NodeConnection requires a Map<NodeId, NodeConnection>. this doesn't work with the withConnection()method of using the connections. I could usewithConnection()inside of theNodeConnectionbut having theNodeConnectionkeep a reference back to theNodeConnectionManager` doesn't feel quite right.

CMCDragonkai · 2022-01-06T05:25:00Z

It's totally fine for node connection to keep a reference back to the manager. We do this already in the networking proxies.

CMCDragonkai · 2022-01-06T05:26:50Z

We would normally want the collection to be weakly referenced. But it's not possible in JS even with weak map. So you'll have to use Map with a reverse reference for garbage collecting it's reference.

CMCDragonkai · 2022-01-06T06:30:18Z

The RWLock should be changed over to write-preferring to prevent write-starvation. Example of this is here: https://gist.github.com/CMCDragonkai/4de5c1526fc58dac259e321db8cf5331#file-rwlock-write-ts

I've already used this in js-async-init to great effect. The current RWLock in js-polykey is read-preferring is not the best way to do it.

CMCDragonkai · 2022-01-06T06:36:19Z

Priorities:

NodeConnectionManager - this is the centerpiece
NodeManager - moving functionality out of this into NodeConnectionManager and also using NodeConnectionManager when interacting with nodes
NodeGraph - changing this to being a data structure and move out some functionality into NodeConnectionManager
Then address the error handling, first in GRPCClient and watching the connectivity state, then in NodeConnection, look into ready method wrapping as well when other users end up calling on a "broken"/"dead" node connection
Finally look at all the other domains that need to call other keynode's agent service like VaultManager.. etc and how they should be changed accordingly to use withConn or our generic with combinator

tegefaulkes · 2022-01-07T00:06:18Z

I've created an interface that exposes only the needed functionality from NodeConnectionManager for NodeConnection. This includes a personalised callback to handle destroying and removing itself from the connection map.

CMCDragonkai · 2022-01-09T07:35:45Z

Rebase please!

CMCDragonkai · 2022-01-09T07:48:51Z

@tegefaulkes can you make sure this PR removes this:

[nix-shell:~/Projects/js-polykey/tests]$ ag 'failedConnectionTimeout'
global.d.ts
11:declare var failedConnectionTimeout: number;

bin/nodes/ping.test.ts
114:    global.failedConnectionTimeout * 2,
136:    global.failedConnectionTimeout * 2,

bin/nodes/find.test.ts
195:    global.failedConnectionTimeout * 2,

bin/nodes/add.test.ts
98:    global.failedConnectionTimeout,
117:    global.failedConnectionTimeout,

client/rpcNodes.test.ts
230:    global.failedConnectionTimeout * 2,
275:    global.failedConnectionTimeout * 2,
301:    global.failedConnectionTimeout * 2,

nodes/NodeManager.test.ts
265:      global.failedConnectionTimeout * 2,
315:    global.failedConnectionTimeout * 2,
391:    global.failedConnectionTimeout * 2,

This should no longer be necessary once you have timeouts dependency injected, or timer mocking. Once done, remove the variable from jest.config.js.

CMCDragonkai · 2022-01-09T08:02:36Z

I've added #274, #216, #254 to this PR @tegefaulkes since these are all related together, it's best to refactor these together.

tegefaulkes · 2022-01-10T00:11:43Z

Re-based on master.

CMCDragonkai · 2022-01-10T01:14:17Z

There's some confusion between all these methods that we should make clearer.

withConnection - with wrapper, should have promise and async generator variant
createConnection - public methods to create a connection but doesn't return the connection
destroyConnection - public method to destroy a connection
establishNodeConnection - wrapped function that actually creates the connection
openConnection - hole punch for reverse proxy
fwdOpenConnection - hole punch for forward proxy

These are all quite confusing, we should be renaming these accordingly.

Investigate whether createConnection should really be public or not, and this has a relationship with the seed nodes synchronisation, it may not be necessary anymore
Maybe createConnection should actually be getConnection, in this case the connection object is actually returned. But this becomes a manual usage.
The establishNodeConnection which is a protected function should be renamed so that it's clearer that it is a wrapper worker function. This is based on the object map pattern which was originally called createObject, and if 2 can be addressed, then establishNodeConnection can be createConnection
The hole punching functions shouldn't be called openConnection... etc, they can be called holePunchForward or holePunchReverse

@tegefaulkes I think it's a good idea for you to review in detail the design of 2 things:

The kademlia node graph so you have an idea how the contact database is being synchronised
How the hole punching ICE logic is supposed to work

tegefaulkes · 2022-01-10T06:22:58Z

When moving some functionality about I keep getting to the question "Do I put it in the NodeManager or the NodeConnectionManager?". It seems to me that the NodeConnectionManager should solely handle the NodeConnections and their life cycle. So maybe all functionality such as getClosestLocalNodes, sendHolePunchMessage or findNode should be in the NodeManager?

Also to note, there are two functions for getClosestLocalNodes. One that calls the GRPC and one that is called by the service. Like above I think they should both be in the NodeManager. However I need a good way to tell them apart by name.

CMCDragonkai · 2022-01-10T06:29:53Z

Connection-related work is indeed in the NodeConnectionManager. So if there are functions relating to managing connections, doing hole punching.

Note in the diagram #225 (comment) NodeManager requires NodeConnectionManager but not the other way around. So this has some implications for how sending notifications needs to occur. It's possible that NodeConnectionManager needs NotificationsManager to do it. But there's a recursive dependency here too.

This mutually recursive dependency has to be resolved by further abstraction, or some sort of normalisation. Prototype how it can be done with sending notifications.

For getClosestLocalNodes... you have the sender and receiver. One is to ask another node what their closest local nodes are, and another is to give back what the closest local nodes are. You could centralise it by giving it a parameter, where if the nodeId is undefined, then it assumes it means itself. Or separate the 2 functions entirely by different names.

tegefaulkes · 2022-01-10T06:54:38Z

NodeManager has a getNodeId() method that just wraps the KeyManager.getNodeId() maybe we should get this directly from the KeyManager?

tegefaulkes · 2022-02-11T08:20:36Z

I've done almost everything now. only things left are

refactoring the destroyedCallback stuff is mostly complete, just have to fix a small problem with it. I've made a comment about it. but it's getting too late to finish now.
the NCM.getInterface() method can be removed and replaced with casting to the interface. But we will have to extract the destroyCallback from it and create that separately.

src/nodes/NodeConnectionManager.ts

CMCDragonkai · 2022-02-14T01:02:40Z

Regarding the NodeConnectionManagerInterface, once you have removed the destroyCallback from the interface, you can do:

Remove the getInterface method entirely. You can just use structural type to satisfy the interface.
Move the NodeConnectionManagerInterface into NodeConnection.ts, it is the receiver that cares about this type, not the sender in this case.
The NodeConnectionManagerInterface can just be private type to NodeConnection.ts, no need to export it. It's equivalent to an embedded type in the function signature to NodeConnection.createNodeConnection.

This means the destroyCallback is passed separately.

src/nodes/NodeConnectionManager.ts

CMCDragonkai · 2022-02-14T02:09:53Z

Some more issues I'm working on:

The placement of the watchConnectivityState should be in the GRPCClient.constructor because it needs to have access the instance to be able to call this.destroy. - DONE
Whether to do wrapping of destroyCallback in NC or not. I figured that no, because each class can handle their own responsibility here.
How should we be checking whether to call the destroyCallback or not, either by checking the status 'destroying' and also when it is already destroyed. - Now also checking && !client[asyncInit.destroyed].
The errors that can come into checkState. According to grpc-js, the only possible errors are: new Error('Deadline passed without connectivity state change') and possibly an exception when the client is stopped. See grpc-node/packages/grpc-js/src/channel.ts. What is the exception when it is stopped? - Many exception checks were removed, because no exceptions could be thrown there. Some are undefined behaviour because the error should never be thrown.
Should be able to remove the usage of setImmediate from watchConnectivityState it should be able to register the handler synchronously. - Now we just call checkState immediately, but we may require a eslint comment to allow it to pass the linting.
Changed to use connectivityState enum, rather than the numbers.

CMCDragonkai · 2022-02-14T02:26:13Z

Still need to add a test that tests the READY to IDLE transition.

joshuakarp · 2022-02-14T02:53:48Z

Still need to add a test that tests the READY to IDLE transition.

See this comment #224 (comment)

Any transition from READY to IDLE is going to kill the connection with the current implementation. At this point, I believe there's only 2 cases where this transition is observable:

when the server closes the connection (already have a test for this)
when the underlying GRPC TTL times out (no test for this yet)

At the moment, the GRPC TTL is internally something like 6 minutes according to @tegefaulkes. I can look into how to override this, but note that with the current implementation, it's just going to close the connection.

If we want to support the transition cycle between IDLE and READY (when the GRPC TTL expires), then some extra logic is necessary (I could chat to @tegefaulkes about this).

CMCDragonkai · 2022-02-14T03:10:47Z

Pushed up some changes to the NodeConnection hierarchy. @tegefaulkes please adjust the changes in the NodeConnectionManager and update test for it as well. There was a lot of un-used parameters for NodeConnection.

joshuakarp · 2022-02-14T03:23:33Z

There's a GRPC_ARG_CLIENT_IDLE_TIMEOUT_MS https://grpc.github.io/grpc/core/group__grpc__arg__keys.html#ga51ab062269cd81298f5adb6fd9a45e99

Timeout after the last RPC finishes on the client channel at which the channel goes back into IDLE state.

Int valued, milliseconds. INT_MAX means unlimited. The default value is 30 minutes and the min value is 1 second.

But there doesn't seem to be a field for this in ChannelOptions.

CMCDragonkai · 2022-02-14T03:24:00Z

Still need to add a test that tests the READY to IDLE transition.

See this comment #224 (comment)

Any transition from READY to IDLE is going to kill the connection with the current implementation. At this point, I believe there's only 2 cases where this transition is observable:

when the server closes the connection (already have a test for this)

when the underlying GRPC TTL times out (no test for this yet)

At the moment, the GRPC TTL is internally something like 6 minutes according to @tegefaulkes. I can look into how to override this, but note that with the current implementation, it's just going to close the connection.

If we want to support the transition cycle between IDLE and READY (when the GRPC TTL expires), then some extra logic is necessary (I could chat to @tegefaulkes about this).

Yes we want a test for 2., see this: https://grpc.github.io/grpc/core/group__grpc__arg__keys.html. According to the docs (https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md) a IDLE_TIMEOUT should exist. But I cannot find this parameter in the GRPC js codebase.

Maybe @tegefaulkes just run a test and leave it running for 10-15 minutes... etc, and see?

CMCDragonkai · 2022-02-14T03:24:18Z

Once we find out how to trigger it, we can always use timer mocking to make it to grpc that the time has passed alot!

CMCDragonkai · 2022-02-14T03:24:50Z

@tegefaulkes at any case, if we cannot find a quick fix to that, we can add that as a later issue. And just fix up the final code based on my changes, and then we can lint/squash merge.

CMCDragonkai · 2022-02-14T03:29:04Z

Will also create a new issue for resolving the cycle in NodeConnection when it has to create the node connection instance before it creates the client instance. Obviously we would want to create the client instance first. But creating the client instance requires the destroyCallback, and the destroy callback requires a reference to the node instance.

A similar situation occurs in GRPCClient, which was resolved by calling this.watchConnectivityState inside the constructor, so it has access to the instance as well.

But in this case, the async creation of the derived GRPCClient classes requires the destroyCallback, because they are constructing the instance.

It seems like the proper solution is to change to StartStop, or CreateDestroyStartStop for these connection classes and make it similar to network domain's ConnectionForward and ConnectionReverse. The CreateDestroy pattern here is just a bit inflexible.

New issue started here #333.

joshuakarp · 2022-02-14T03:35:40Z

There's a GRPC_ARG_CLIENT_IDLE_TIMEOUT_MS https://grpc.github.io/grpc/core/group__grpc__arg__keys.html#ga51ab062269cd81298f5adb6fd9a45e99

Timeout after the last RPC finishes on the client channel at which the channel goes back into IDLE state.
Int valued, milliseconds. INT_MAX means unlimited. The default value is 30 minutes and the min value is 1 second.

But there doesn't seem to be a field for this in ChannelOptions.

There's a few mentions of this in the C-based GRPC: https://github.com/grpc/grpc/search?q=GRPC_ARG_CLIENT_IDLE_TIMEOUT_MS&type=code

CMCDragonkai · 2022-02-14T03:40:15Z

There's a GRPC_ARG_CLIENT_IDLE_TIMEOUT_MS https://grpc.github.io/grpc/core/group__grpc__arg__keys.html#ga51ab062269cd81298f5adb6fd9a45e99

Timeout after the last RPC finishes on the client channel at which the channel goes back into IDLE state.
Int valued, milliseconds. INT_MAX means unlimited. The default value is 30 minutes and the min value is 1 second.

But there doesn't seem to be a field for this in ChannelOptions.

There's a few mentions of this in the C-based GRPC: https://github.com/grpc/grpc/search?q=GRPC_ARG_CLIENT_IDLE_TIMEOUT_MS&type=code

Can you make a question on grpc-js about this parameter.

joshuakarp · 2022-02-14T04:42:17Z

I've posted a question on their discussion channel https://groups.google.com/g/grpc-io/c/yq4pmBGaXOQ

tegefaulkes · 2022-02-14T04:52:16Z

Everything is fixed now.
I've run all the tests and only 3 tests files are failing.

tests/bin/identities/identities.test.ts - should be fixed in test splitting CLI and Client & Agent Service test splitting #311
tests/client/service/notificationsSend.test.ts - ditto
tests/vaults/VaultManager.test.ts - should be fixed in Introducing vault sharing and the restrictions of pulling and cloning with vault permissions #266

I'm going to start the squash and merge process now.

CMCDragonkai · 2022-02-14T04:58:39Z

Ok I'll add create a new issue regarding the idle timeout. #332.

…ctionManager: - fixes and updating inline documentation - Removing connection establishment before adding node to NodeGraph - Made `createConnection` private - Removed transaction and locking from `NodeManager` - removed `clearDB` from `NodeGraph` - Added `nodeConnectionManagerConfig` to `PolykeyAgent` - added `withConnF`, `withConnG` and `acquireConnection` to `NodeConnectionManager` - moved some functionality from `NodeManager` and `NodeGraph` to `NodeConnectionManager` - Implementing `NodeConnectionManager.ts` - Extracted `NodeGraph` from `nodeManager` - Updated `nodes/errors.ts` exit codes - Updated `@matrixai/id` to `^3.0.0` - Fixed random failures in NodeGraph.test.ts - Expanded tests for `NodeConnectionManager` and `NodeConnection` to check connection failure conditions cleanup triggered in response - NodeConnection connection failure handling - Updated asyn-init pattern for `NodeGraph`, `NodeManager`, `NodeConnection` and `NodeConnectionManager` - Shuffled methods to appropriate classes - Switched to selective imports to remove cycles - Fixed GRPC logging - Refactored `NodeGraph` distance calulation and fixed tests - NodeConnection only attemps hole punching for non seed nodes - Removed `getNodeId` wrappers from nodes classes - Added curly rule to eslint to prevent braceless for single statement

joshuakarp self-assigned this Dec 22, 2021

tegefaulkes mentioned this pull request Dec 29, 2021

Introducing vault sharing and the restrictions of pulling and cloning with vault permissions #266

Merged

15 tasks

CMCDragonkai mentioned this pull request Dec 31, 2021

Adding QA jobs and expanding test jobs for CI/CD. #292

Merged

14 tasks

tegefaulkes self-assigned this Jan 6, 2022

tegefaulkes force-pushed the nodeconnectionmanager branch from ca666a0 to 8b1f76d Compare January 10, 2022 00:11

CMCDragonkai mentioned this pull request Jan 10, 2022

Testnet Node Deployment (testnet.polykey.io) #194

Closed

4 tasks

CMCDragonkai closed this Jan 10, 2022

CMCDragonkai reopened this Jan 10, 2022

joshuakarp commented Feb 13, 2022

View reviewed changes

src/nodes/NodeConnectionManager.ts Outdated Show resolved Hide resolved

joshuakarp commented Feb 13, 2022

View reviewed changes

src/nodes/NodeConnectionManager.ts Outdated Show resolved Hide resolved

CMCDragonkai reviewed Feb 14, 2022

View reviewed changes

src/nodes/NodeConnectionManager.ts Outdated Show resolved Hide resolved

CMCDragonkai reviewed Feb 14, 2022

View reviewed changes

src/nodes/NodeConnectionManager.ts Show resolved Hide resolved

CMCDragonkai reviewed Feb 14, 2022

View reviewed changes

src/nodes/NodeConnectionManager.ts Outdated Show resolved Hide resolved

CMCDragonkai mentioned this pull request Feb 14, 2022

Verify GRPC channel connectivity state READY to IDLE transition #332

Closed

tegefaulkes force-pushed the nodeconnectionmanager branch from d7c67a6 to b09e000 Compare February 14, 2022 05:23

tegefaulkes marked this pull request as ready for review February 14, 2022 05:24

CMCDragonkai merged commit 4aacfb8 into master Feb 14, 2022

CMCDragonkai deleted the nodeconnectionmanager branch February 14, 2022 05:25

CMCDragonkai mentioned this pull request Feb 14, 2022

Resolve cyclic dependencies by aligning GRPCClient derived classes and NodeConnection to StartStop pattern #333

Closed

tegefaulkes mentioned this pull request Mar 24, 2022

Seed node not adding details of connecting node to its NodeGraph #344

Closed

6 tasks

tegefaulkes mentioned this pull request Jun 1, 2022

Preparing for Testnet Deployment #378

Merged

29 tasks

Extracting Node Connection Management out of NodeManager to NodeConnectionManager #310

Extracting Node Connection Management out of NodeManager to NodeConnectionManager #310

Conversation

joshuakarp commented Dec 22, 2021 • edited by tegefaulkes Loading

Description

Issues Fixed

Tasks

Final checklist

joshuakarp commented Dec 22, 2021

joshuakarp commented Dec 22, 2021 • edited Loading

joshuakarp commented Dec 23, 2021

joshuakarp commented Dec 23, 2021

joshuakarp commented Dec 23, 2021

joshuakarp commented Dec 23, 2021

CMCDragonkai commented Dec 28, 2021 • edited Loading

CMCDragonkai commented Dec 29, 2021

tegefaulkes commented Jan 6, 2022

CMCDragonkai commented Jan 6, 2022

CMCDragonkai commented Jan 6, 2022

CMCDragonkai commented Jan 6, 2022

CMCDragonkai commented Jan 6, 2022 • edited Loading

tegefaulkes commented Jan 7, 2022

CMCDragonkai commented Jan 9, 2022

CMCDragonkai commented Jan 9, 2022

CMCDragonkai commented Jan 9, 2022

tegefaulkes commented Jan 10, 2022

CMCDragonkai commented Jan 10, 2022

tegefaulkes commented Jan 10, 2022

CMCDragonkai commented Jan 10, 2022

tegefaulkes commented Jan 10, 2022

tegefaulkes commented Feb 11, 2022

CMCDragonkai commented Feb 14, 2022

CMCDragonkai commented Feb 14, 2022 • edited Loading

CMCDragonkai commented Feb 14, 2022

joshuakarp commented Feb 14, 2022 • edited Loading

CMCDragonkai commented Feb 14, 2022

joshuakarp commented Feb 14, 2022

CMCDragonkai commented Feb 14, 2022 • edited Loading

CMCDragonkai commented Feb 14, 2022

CMCDragonkai commented Feb 14, 2022

CMCDragonkai commented Feb 14, 2022 • edited Loading

joshuakarp commented Feb 14, 2022

CMCDragonkai commented Feb 14, 2022

joshuakarp commented Feb 14, 2022

tegefaulkes commented Feb 14, 2022

CMCDragonkai commented Feb 14, 2022 • edited Loading

Extracting Node Connection Management out of `NodeManager` to `NodeConnectionManager` #310

Extracting Node Connection Management out of `NodeManager` to `NodeConnectionManager` #310

joshuakarp commented Dec 22, 2021 •

edited by tegefaulkes

Loading

joshuakarp commented Dec 22, 2021 •

edited

Loading

CMCDragonkai commented Dec 28, 2021 •

edited

Loading

CMCDragonkai commented Jan 6, 2022 •

edited

Loading

CMCDragonkai commented Feb 14, 2022 •

edited

Loading

joshuakarp commented Feb 14, 2022 •

edited

Loading

CMCDragonkai commented Feb 14, 2022 •

edited

Loading

CMCDragonkai commented Feb 14, 2022 •

edited

Loading

CMCDragonkai commented Feb 14, 2022 •

edited

Loading