-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting Node Connection Management out of NodeManager
to NodeConnectionManager
#310
Conversation
From yesterday's discussions, the plan for the class NodeConnectionManager {
public withConnection() {
// if conn exists, lock = its read lock
// otherwise, lock = a new write lock
createConnection(lock)
// release lock
}
public createConnection(lock?) {
if (lock == null) newLock = new write lock
// do connection creation
if (lock == null) release newLock
}
} As a result of this, the expectation is that we'd only really pass in our own lock when you know what you're doing, and that you've already acquired the lock that you're passing in. That seems like somewhat ambiguous behaviour to me. I feel like all of this locking behaviour shouldn't be publicly exposed at all. I know that it's an optional parameter, but we don't want other domains to need to think about this lock parameter at all. Alternatively, we could do the following: class NodeConnectionManager {
public withConnection() {
createConnection()
// now we know a connection must exist
// so simply get the connAndLock, acquire its read lock and do the operation
}
public createConnection() {
// this should basically be identical behaviour to what getConnectionToNode currently is
// however, it shouldn't return the NodeConnection:
get connAndLock
if connAndLock != null
if conn != null: return
acquire write lock
if conn != null after acquiring: return
otherwise create connection
release write lock
if connAndLock == null:
create lock and acquire write
set lock in connections map
create conn
set conn in connections map
release write lock
}
} |
With the current committed usage of public async withConnection<T>(
targetNodeId,
f: (conn: NodeConnection) => Promise<T>,
): Promise<T> {
// Ensure a connection exists
await this.createConnection(targetNodeId);
const connAndLock = this.connections.get(targetNodeId);
// Unexpected behvaiour: between call to create connection and retrieval,
// the conn or conn entry was deleted
// if (connAndLock == null || connAndLock.connection == null) {
// throw new Error();
// }
// The above is commented out such that we don't need to create a NodeConnection - just testing the locking mechanisms
// Acquire the read lock (allowing concurrent connection usage)
return await connAndLock!.lock.read<T>(async () => {
return await f(connAndLock!.connection!);
});
};
// these are used by `withConnection`
// reusing terms from network domain:
// connConnectTime - timeout for opening the connection
// connTimeoutTime - TTL for the connection itself
public async createConnection(targetNodeId: NodeId): Promise<void> {
let connection: NodeConnection | undefined;
let lock: utils.RWLock;
let connAndLock = this.connections.get(targetNodeId);
if (connAndLock != null) {
({ connection, lock } = connAndLock);
// Connection already exists, so return
if (connection != null) return;
// Acquire the write (creation) lock
await lock.write(async () => {
// Once lock is released, check again if the conn now exists
connAndLock = this.connections.get(targetNodeId);
if (connAndLock != null && connAndLock.connection != null) return;
// TODO: create the connection and set in map
console.log('existing lock: created connection');
});
} else {
lock = new utils.RWLock();
connAndLock = { lock };
this.connections.set(targetNodeId, connAndLock);
await lock.write(async () => {
// TODO: create the connection and set in map
console.log('no existing entry: created connection');
});
}
} With the following test.ts: import { NodeConnectionManager } from './src/nodes';
import type { NodeId } from './src/nodes/types';
import { sleep } from './src/utils';
async function main() {
const connManager = new NodeConnectionManager();
await connManager.createConnection('a' as NodeId);
await Promise.race([
connManager.withConnection('a' as NodeId, async () => {
console.log('1.0');
await sleep(1000);
console.log('1.1');
await sleep(1000);
console.log('1.2');
await sleep(1000);
console.log('1.3');
await sleep(1000);
console.log('1.4');
}),
connManager.withConnection('a' as NodeId, async () => {
console.log('2.0');
await sleep(1000);
console.log('2.1');
await sleep(1000);
console.log('2.2');
await sleep(1000);
console.log('2.3');
await sleep(1000);
console.log('2.4');
}),
]);
}
main(); We get:
Changing the
This is expected, as the write lock should block concurrent usage. |
There needs to be some further discussion concerning the async-init patterns we use for the nodes domain. Original comment from @CMCDragonkai about this here #225 (comment):
Also worthwhile to recall this comment from me regarding a problem with ordering on |
We'll also need to slightly rework the usage of seed node connections. At the moment, we have a function
This shouldn't require too much of a rework. Initially thinking about this, we can instead statically store the seed node IDs in |
There's also the problem of code repetition when using
Both places connect to a remote node and retrieve a set of closest nodes (i.e. the same functionality). We'll no longer be wrapping this common functionality within a class NodeConnectionManager {
public requestClosestNodes(targetNodeId) {
return this.withConnection(targetNodeId, async () => {
// functionality to get the closest nodes
});
}
public syncNodeGraph() {
// ...
this.requestClosestNodes(this.keyManager.getNodeId());
}
public getClosestGlobalNodes(targetNodeId) {
// ...
this.requestClosestNodes(targetNodeId);
}
} Although I am wary that this starts to lean towards the issues that we're currently trying to solve (where |
Given that public async getNode(targetNodeId: NodeId): Promise<NodeAddress | undefined> {
return await this.nodeGraph.getNode(targetNodeId);
} We have similar functions for setting a node, getting all the node buckets, getting the closest global nodes, and refreshing the buckets. This would mean that we would instead inject So 2 alternatives here:
|
Regarding with contexts and general resource management patterns. In OOP we have RAII, and this is being done through the js-async-init. But it's fairly a manual process with In imperative programming, we have the In functional programming we have additional pattern: with contexts or known as the bracketing pattern. This is a higher order function that takes a callback and injects it the resource, and when the callback is done, it will release/destroy the resource. We have used this in a number of places as it's a nicer interface for interacting with resources and it automates the destruction as it scopes the resource lifecycle to the function callstack/scope. We have used this in a number of places and intend to use it for things like One issue with using brackets, is that JS has 2 kinds of async functions (assume that such resource management always takes place asynchronously), ones that return a promise, and ones that return an async generator. And this would require 2 different variants of the bracket function like For example a general // similar to python's with Context() as a, Context() as b:
withF([resource, resource], (a, b) => {
});
// need a async generator variant
withG([resource, resource], async function* () {
}); To do the above, the resources must all have a similar expected interface. Perhaps something like: // the T may be optional, in that it may not actually be available
// locking resource doesn't give any lock resource to the callback
type Resource<T> = () => Promise<[T, () => Promise<void>]> This would allow the with combinator to always work any resource such as those exposed via imperative APIs. let count = 0;
withF([
async () => {
++count;
return [count, async () => { --count; }]
}
], (c) => {
console.log(c);
}) The with combinator signature might be complicated, since you have to accumulate the types of resources that will be acquired, and expect a function to be passed in that takes those types. This would be similar to the signature for A promise chaining system would be similar to how promises can chain stuff: resourceA().with(resourceB).with(resourceC).then((a, b, c) => {
}); But I can't really see this being much better than what is discussed above. @tegefaulkes @joshuakarp relevant to our discussions about the right API for nodes and vaults. |
If
So imagine for class ACL {
public function acquireLock() {
await this.lock.acquire();
return [undefined, () => {
release();
}];
}
} Then external users could then use |
Right now NodeConnection requires a |
It's totally fine for node connection to keep a reference back to the manager. We do this already in the networking proxies. |
We would normally want the collection to be weakly referenced. But it's not possible in JS even with weak map. So you'll have to use Map with a reverse reference for garbage collecting it's reference. |
The RWLock should be changed over to write-preferring to prevent write-starvation. Example of this is here: https://gist.github.com/CMCDragonkai/4de5c1526fc58dac259e321db8cf5331#file-rwlock-write-ts I've already used this in js-async-init to great effect. The current RWLock in js-polykey is read-preferring is not the best way to do it. |
Priorities:
|
I've created an interface that exposes only the needed functionality from NodeConnectionManager for NodeConnection. This includes a personalised callback to handle destroying and removing itself from the connection map. |
Rebase please! |
@tegefaulkes can you make sure this PR removes this:
This should no longer be necessary once you have timeouts dependency injected, or timer mocking. Once done, remove the variable from |
I've added #274, #216, #254 to this PR @tegefaulkes since these are all related together, it's best to refactor these together. |
ca666a0
to
8b1f76d
Compare
Re-based on master. |
There's some confusion between all these methods that we should make clearer.
These are all quite confusing, we should be renaming these accordingly.
@tegefaulkes I think it's a good idea for you to review in detail the design of 2 things:
|
When moving some functionality about I keep getting to the question "Do I put it in the Also to note, there are two functions for |
Connection-related work is indeed in the Note in the diagram #225 (comment) This mutually recursive dependency has to be resolved by further abstraction, or some sort of normalisation. Prototype how it can be done with sending notifications. For |
NodeManager has a |
I've done almost everything now. only things left are
|
Regarding the
This means the |
Some more issues I'm working on:
|
Still need to add a test that tests the READY to IDLE transition. |
See this comment #224 (comment) Any transition from
At the moment, the GRPC TTL is internally something like 6 minutes according to @tegefaulkes. I can look into how to override this, but note that with the current implementation, it's just going to close the connection. If we want to support the transition cycle between |
Pushed up some changes to the |
There's a
But there doesn't seem to be a field for this in |
Yes we want a test for 2., see this: https://grpc.github.io/grpc/core/group__grpc__arg__keys.html. According to the docs (https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md) a Maybe @tegefaulkes just run a test and leave it running for 10-15 minutes... etc, and see? |
Once we find out how to trigger it, we can always use timer mocking to make it to grpc that the time has passed alot! |
@tegefaulkes at any case, if we cannot find a quick fix to that, we can add that as a later issue. And just fix up the final code based on my changes, and then we can lint/squash merge. |
Will also create a new issue for resolving the cycle in A similar situation occurs in But in this case, the async creation of the derived It seems like the proper solution is to change to New issue started here #333. |
There's a few mentions of this in the C-based GRPC: https://github.com/grpc/grpc/search?q=GRPC_ARG_CLIENT_IDLE_TIMEOUT_MS&type=code |
Can you make a question on grpc-js about this parameter. |
I've posted a question on their discussion channel https://groups.google.com/g/grpc-io/c/yq4pmBGaXOQ |
Everything is fixed now.
I'm going to start the squash and merge process now. |
Ok I'll add create a new issue regarding the idle timeout. #332. |
…ctionManager: - fixes and updating inline documentation - Removing connection establishment before adding node to NodeGraph - Made `createConnection` private - Removed transaction and locking from `NodeManager` - removed `clearDB` from `NodeGraph` - Added `nodeConnectionManagerConfig` to `PolykeyAgent` - added `withConnF`, `withConnG` and `acquireConnection` to `NodeConnectionManager` - moved some functionality from `NodeManager` and `NodeGraph` to `NodeConnectionManager` - Implementing `NodeConnectionManager.ts` - Extracted `NodeGraph` from `nodeManager` - Updated `nodes/errors.ts` exit codes - Updated `@matrixai/id` to `^3.0.0` - Fixed random failures in NodeGraph.test.ts - Expanded tests for `NodeConnectionManager` and `NodeConnection` to check connection failure conditions cleanup triggered in response - NodeConnection connection failure handling - Updated asyn-init pattern for `NodeGraph`, `NodeManager`, `NodeConnection` and `NodeConnectionManager` - Shuffled methods to appropriate classes - Switched to selective imports to remove cycles - Fixed GRPC logging - Refactored `NodeGraph` distance calulation and fixed tests - NodeConnection only attemps hole punching for non seed nodes - Removed `getNodeId` wrappers from nodes classes - Added curly rule to eslint to prevent braceless for single statement
d7c67a6
to
b09e000
Compare
Description
The refactoring effort to make our nodes domain more robust.
Issues Fixed
NodeId
as a proxy/singleton instance #254NodeGraph
test failures #274Tasks
NodeConnectionManager
:NodeManager
toNodeConnectionManager
#310 (comment)RWLock
:withConnection
createConnection
destroyConnection
NodeManager
:openConnection
findNode
NodeGraph
:getClosestLocalNodes
getClosestGlobalNodes
NodeManager
) Extracting Node Connection Management out ofNodeManager
toNodeConnectionManager
#310 (comment)NodeManager
toNodeConnectionManager
#310 (comment) Extracting Node Connection Management out ofNodeManager
toNodeConnectionManager
#310 (comment)NodeConnectionManager
. this is triggered by theNodeConnection
itself.NodeConnection
:- Determine most appropriate async-init pattern Extracting Node Connection Management out of
NodeManager
toNodeConnectionManager
#310 (comment)- Move all internal function implementations to use
withConnection
in relevant domain (and remove fromNodeConnection
after moving):getClosestNodes
: toNodeConnectionManager.getClosestGlobalNodes
sendHolePunchMessage
: toNodeConnectionManager
?sendNotification
: toNotificationsManager.sendNotification
getChainData
: toNodeManager.requestChainData
claimNode
: toNodeManager.claimNode
scanVaults
: toNodeManager
/VaultManager
? Will need rebase on Introducing vault sharing and the restrictions of pulling and cloning with vault permissions #266 when merged to bring this inNode connection needs to properly close in all cases where the client connection fails. if the client still exists when the connection should be dead then we have a resource leak.
NodeGraph
:NodeManager
toNodeConnectionManager
#310 (comment)getClosestLocalNodes
getClosestGlobalNodes
syncNodeGraph
? This currently utilises connections, so will potentially need to be moved toNodeConnectionManager
.NodeManager
:NodeManager
toNodeConnectionManager
#310 (comment)NodeGraph
Extracting Node Connection Management out ofNodeManager
toNodeConnectionManager
#310 (comment)confirm issues have been addressed
Refactorbeing handled in ChangeNodeId
as a proxy/singleton instance #254NodeId
from encoded string to an opaque alias ofId
which is a typed array from js-id #318NodeGraph
test failures #274things to double check.
@ready
errors are correct inNodeGraph
,NodeManager
andNodeConnectionManager
NodeManager
.NodeGraph
,NodeConnectionManager
andNodeConnection
tests and update where needed.review based on this comment. Review js-id usage for NodeId, VaultId, NotificationId, PermId, ClaimId, Gestalts andaddressed by ChangeGenericIdTypes.ts
#299 (comment)NodeId
from encoded string to an opaque alias ofId
which is a typed array from js-id #318.NodeManager
toNodeConnectionManager
#310 (comment)nodes
domain. Extracting Node Connection Management out ofNodeManager
toNodeConnectionManager
#310 (comment). This may apply to other domains, but we decided to leave that for the scope of other PRs. Related: ChangeNodeId
from encoded string to an opaque alias ofId
which is a typed array from js-id #318 (comment)Final checklist
Once this is merged, it's time to do Testnet Deployment! #194