Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add support for websockets #48

Open
rixcian opened this issue Sep 11, 2023 · 19 comments
Open

feat(api): add support for websockets #48

rixcian opened this issue Sep 11, 2023 · 19 comments

Comments

@rixcian
Copy link
Contributor

rixcian commented Sep 11, 2023

Summary

Create a WebSocket server for ar.io gateways to enable users/developers to subscribe to events on Arweave.

Motivation

There's no service that would enable event subscription in the Arweave ecosystem via websockets.

Use cases for devs

  • Wallets can subscribe to an event when a certain transaction gets mined
  • Other indexers or backends can subscribe to transactions with specific tags
  • Other use-cases coming from a fact that you can particularly sub to any event, e.g. new block mined, new tx mined that meets specific ANS, ...

Other motivations

  • The big advantage of WebSockets is that after the handshake procedure, the overhead of individual messages is low, making it good for sending a high number of requests.
  • Currently, there is no real-world use for indexed data in the database; this feature would finally give indexing some meaning
  • As mentioned above, there's no solution for ws in the space right now, so it would be a good selling point for ar.io
  • Note: This feature will be totally optional by enabling the WS_ENABLED variable in .env

Specification

Because WebSockets doesn't follow any strict rules/protocols there's a need to create its own communication protocol.

This is a type specification of each message in this new protocol:

{
  "method": "<specify the type of action; e.g. subscribe, unsubscribe, error states, etc.>",
  "event": "<specify type of event you want to sub; only used in 'ar_subscribe method'>",
  "params": "<specify parameters of the event; for every event/method it's different>",
  "result": "<specify the result data; used only by server for returning data>"
}

Example of communication

It works by subscribing to particular events. The gateway will return a subscription ID. For each event that matches the subscription, a notification with relevant data is sent together with the subscription ID.

  1. Create subscription (this example: subscribe to all new txs):
{
  "method": "ar_subscribe",
  "event": "new_tx",
  "params": null,
}
  1. Response from ar.io gateway (returns a subscription id - uuid v4):
{
  "method": "ar_subscribe_accepted",
  // Subscription ID
  "result": "bd52673d-832f-4b34-a79d-79058e7f6989",
}
  1. After the creation of the subscription, the client will receive a notification in this format:
{
  "method": "ar_subscribe_msg",
  "params": {
    "subscription": "bd52673d-832f-4b34-a79d-79058e7f6989"
  },
  "result": [
    { "id": "FPDHr4gKD...yG2I", "owner": "sb7fS07r5...AVm0", "tags": [...], ... },
    { "id": "4d0G6Nk4z...hm44", "owner": "pnqyui51Q...jLvQ", "tags": [...], ... },
    ...
  ],
}
  1. If the client wants to cancel the subscription:
{
  "method": "ar_unsubscribe",
  "params": {
    "subscription": "bd52673d-832f-4b34-a79d-79058e7f6989",
  },
}

Event methods

  • ar_subscribe
  • ar_subscribe_accepted
  • ar_subscribe_denied
  • ar_subscribe_msg
  • ar_unsubscribe

Event types

  • new_tx
    • params:
      • owner: string
      • target: string
      • tags: { name: string, value: string }[]
    • result: Transaction[]
  • new_block
    • params: none
    • result: Block

Rate-limiting

  • Gateway operator could be able to somehow rate-limit the ws server by:

This would probably need some discussion if it's needed or not.

ENVs

WS_ENABLED | type: boolean | default: false | desc: Start the WebSocket server
WS_PORT | type: number | default: 3333 | desc: Port of the WebSocket server
WS_CORS | type: string | default: * | desc: Allow access from specific origin

Implementation details

  • Use of the ws module in Node.js should be sufficient
  • Use of redis (or dragonflydb) in-memory db for storing subscription IDs and params or use just own implementation of some structure?

Considerations

  • Notifications are sent for current events and not for past events
  • Subscriptions are coupled to a connection. If the connection is closed all subscriptions that are created over this connection are removed

It's the first blueprint, so there will likely be some blind spots that I've missed. Feedback is highly appreciated. :)

@djwhitt
Copy link
Collaborator

djwhitt commented Sep 12, 2023

@rixcian Hey, just wanted to say thanks for this well written ticket! The team is in general agreement that there's a need for something like this, and the details seem reasonable at least as a starting point. We have a lot of other work in flight as a team and probably can't take this on ourselves at the moment, but could provide support if you're interested. Is this something you have capacity to work on yourself? If so, we can spend more time reviewing the details.

@rixcian
Copy link
Contributor Author

rixcian commented Sep 13, 2023

@djwhitt Hey, thanks for the kind words! Yeah, if you're okay with the specs, I'm ready to start working on this right away.

@djwhitt
Copy link
Collaborator

djwhitt commented Sep 13, 2023

@rixcian Great! We'll spend a bit more time digging in and get back to you with some design questions.

@djwhitt
Copy link
Collaborator

djwhitt commented Sep 15, 2023

Alright, first off, minor quibble. 😆

Currently, there is no real-world use for indexed data in the database; this feature would finally give indexing some meaning

You do know that the indexed data is queryable using GraphQL, right? 🙂

@djwhitt
Copy link
Collaborator

djwhitt commented Sep 15, 2023

{
"method": "<specify the type of action; e.g. subscribe, unsubscribe, error states, etc.>",
"event": "<specify type of event you want to sub; only used in 'ar_subscribe method'>",
"params": "<specify parameters of the event; for every event/method it's different>",
"result": "<specify the result data; used only by server for returning data>"
}

Can expand a bit on why this format in particular was chosen. It seems like having a type and perhaps a unique id and then letting the rest of the message format be determined by those might be more flexible.

@djwhitt
Copy link
Collaborator

djwhitt commented Sep 15, 2023

  • Use of redis (or dragonflydb) in-memory db for storing subscription IDs and params or use just own implementation of some structure?

For single node, in memory data structures would be fine (I don't think Dragonfly DB is needed). Redis sounds good for a multi-node setup. Code should be written to an interface that can be implemented on both. It would also be very helpful to get a diagram of how you envision subscription state and message streams being managed in a multi-node setup.

@djwhitt
Copy link
Collaborator

djwhitt commented Sep 15, 2023

I'm also curious if you've considered the pros and cons of implementing our own websocket protocol vs using something like GraphQL subscriptions.

@angrymouse
Copy link

Hey @rixcian! Thanks for issue, was about to create basically same issue myself.
I think subscribing to changes in gql results would be cool, however it might slow down node a lot if there are a lot of subscriptions.

Writing special protocol for subscriptions would make it more performant, but would also require much more hours to do.
One thing that is clear is that having to poll graphql is not best way to go for a lot of usecases.

@djwhitt LMK if you want PR with implementation of this functionality.

@angrymouse
Copy link

// Our possible usecase, not important for overall discussion, but might explain why it's good idea.

Good usecase for this would be some sort of system with a lot of updates not linked to each other (i.e a lot of SmartWeave contracts).

In RareWeave, we have a lot of NFT contracts (hundreds), and we need to check for interactions for each of them. Currently it's done by chunking it to multiple queries, each responsible for specified range of contracts. Then rinse and repeat, syncing new interactions over time.
Of course main drawback of this approach is speed, it takes around 1 minute to fully traverse through all contracts, and this time will grow with new contracts.
Also delay is needed to not get ratelimited by node, so our smart contract engine gets data with ~3 minute delay, which hurts UX a lot.
Subscription would make it much easier for such kinds of applications to function.

@djwhitt
Copy link
Collaborator

djwhitt commented Sep 18, 2023

// Our possible usecase, not important for overall discussion, but might explain why it's good idea.

Definitely important to hear uses cases!

LMK if you want PR with implementation of this functionality.

I don't think we're quite ready for a PR on this, but if you wanted to take a stab at an architecture diagram for a multi-node setup as well, that could be helpful.

@angrymouse
Copy link

@djwhitt Multi-node as in multiple ar.io nodes? Or something else?

@djwhitt
Copy link
Collaborator

djwhitt commented Sep 20, 2023

@angrymouse Yes. Assume that we have N ar.io nodes and the actual websocket can be connected to any of them. What's the architecture for streaming updates and maintaining subscription state. (note: there are probably some standard ways to do this with Apollo, but I haven't investigated them yet)

@t8
Copy link

t8 commented Oct 11, 2023

Supportive of this. Would serve several use cases we'd be interested in pursuing with ArConnect.

@djwhitt
Copy link
Collaborator

djwhitt commented Oct 13, 2023

@t8, I'm interested in hearing more details about your use cases if you can share.

@angrymouse
Copy link

@djwhitt I think there shouldn't be some kind of special protocol for keeping socket updated. Each node just streams its data once it received it (I.e loaded bundle or block)

@djwhitt
Copy link
Collaborator

djwhitt commented Oct 13, 2023

@angrymouse Do you mean unbundling, tx processing, etc. would only be streamed to the clients connected to the writer in a multi-node setup?

@djwhitt
Copy link
Collaborator

djwhitt commented Oct 13, 2023

To add a bit more context - the multi-node scenario we need to support initially is 1 DB writer with multiple readers, but in the future we may have more complex topologies with unbundling handled by different nodes than L1 TX indexing (that's how arweave.net works today). We don't have to have all that sorted out initially, but we do need to have a design that accommodates growth in that direction.

@angrymouse
Copy link

@djwhitt If I understood correctly, we still can just do 1 writer>readers. Just that readers can be subscribed to multiple writers in case some of them doesn't process needed chunk of data.

@djwhitt
Copy link
Collaborator

djwhitt commented Oct 17, 2023

@angrymouse I think we might be talking about different kinds of readers and writers. The scenario I'm thinking about is when you have 2 ar.io gateway services running. One of them reads and writes to a DB. The other one only reads. Both accept connections from users. We need a way for the users connected to the reader to receive the same messages as the users connected to the writer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants