Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colossus: Enable communication between nodes via TCP / WebSocket transport #5196

Open
Lezek123 opened this issue Jan 9, 2025 · 0 comments
Open
Labels

Comments

@Lezek123
Copy link
Contributor

Lezek123 commented Jan 9, 2025

The issue

We're currently handling p2p communication between Colossus nodes via the public HTTP(S) API, which has several issues:

  • Communication is slow and limited, because connections are not persistent and there's a substantial protocol overhead
  • The overhead will increase even more once we introduce authentication / authorization rules which would require verifying, for example, whether a given node is allowed to request a given object (see: Infrastructure authentication #4414)

I'll use a cleanup service as an example.

Currently for each data object that's about to be removed from local storage we're making HTTP(S) requests to all nodes which are supposed to store this object to ensure the replication threshold will still be met after removal:

      for (const { storageBucket } of movedDataObject.storageBag.storageBuckets) {
        const url = urljoin(bucketOperatorUrlById.get(storageBucket.id), 'api/v1/files', movedDataObject.id)
        await superagent.head(url).timeout(timeoutMs).set('X-COLOSSUS-HOST-ID', hostId)
        dataObjectReplicationCount++
      }

This takes a very long time and causes connectivity issues if the number of objects is very large.

Although this particular issue can be resolved by introducing some sort of batch request to check status of multiple objects at once (which I think is this is a good idea in general) I think there are many scenarios that would benefit from having a faster way of exchanging information between Colossus nodes.

Another example is the synchronization process. It could be much faster (and safer) if peers to sync from were chosen based on their current load and proximity instead of selected randomly. But to exchange this kind of information efficiently we need to have other means of communication than HTTP API.

Potential solutions

Add support for TCP / WebSocket communication between Colossus nodes.
The initial implementation can target optimizing the cleanup task by introducing persistent connections between Colossus nodes that they can use to exchange information about which data objects they are storing, ie.:

  • Node A sends request to Node B asking if it has objects [101, 102, 105]
  • Node B responds with a simple boolean status for each object (for example: b010, meaning: no, yes, no)

To enable greater flexibility for the future I recommend exploring libp2p implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Todo
Development

No branches or pull requests

1 participant