-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic ref-counted Anchor implementation #108
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read the description and looked over the changes, makes sense to me. I do have some questions about this capability, and zooming out and looking more broadly at channels and CSP. My immediate thought was that this seems very similar to PubSub, generally speaking (I haven't actually looked into the PubSub implementation in ww).
I read this to help me understand how CSP is different to PubSub. In the article, it says the main difference is
you can't put to the channel if there is no one to take and you can't take unless there is someone to put
So channels are somewhat ephemeral whereas a PubSub bus is usually a long-term thing?
Question 1: Is that a reasonable way to think about this or am I missing something more critical? Is there anything else you would suggest I read?
Onto anchors. So an anchor is like a cluster-scope filepath that can be used by an application to expose a channel, making it available for other applications (or other instances of the same application?) to write to the channel and read from the channel.
Question 2: In your pseudocode example, the peer id is hardcoded, but applications shouldn't necessarily "care" about what peer the channel of interest is on right? So once an anchor is created, how would interested parties find out about its existence and know where to find it? - I guess this more of an orchestration question, like in Kubernetes you submit your application to the cluster and the cluster chooses where those containers run, is wetware fundamentally different? Is it up to the user to control what runs on each peer?
@ayazabbas Thanks for the review, and for the questions 😃 CSP
It's not wrong, but it feels a bit narrow, so let me see if I can reframe. Reading between the lines, it seems like you're trying to answer the question "CSP vs what?" That is, we are picking CSP over some alternative; what is that alternative? The answer is: CSP is an alternative to shared memory & mutexes. The entire point is to enable concurrency without locking. From there, it's pretty intuitive. To have CSP, you need two things:
You are already familiar with CSP: Unix processes and pipes are CSP, goroutines and I'm hoping that with this starting point, the rest kinda falls into place, but please follow up if not 😃
PubSub is arguably related to channels insofar as it's IPC that doesn't involve locking. If you squint at it just right, you could rightly call a pubsub topic a "multicast channel". I think this is a fine way to interpret it. In fact, it gets at the heart of why we have both PubSub and channels: they do roughly the same job, but make different trade-offs:
The decision to use one vs the other pretty much boils down to the messaging patterns required by the application.
This is true of synchronous (unbuffered) channels. Just like Go, Wetware allows you to construct buffered channels that can hold up to n items. A send to a buffered channel only blocks if the buffer is full, else it enqueues the value and moves on.
I don't think I would agree with this characterization. A channel will usually last as long as the process(es) that are using it. Here I think it's helpful to think of goroutines. You can have a pair of long-running goroutines that communicate on one or more channels. These channels will stick around until both goroutines exit. The same principle applies here. Here's a simple example: imagine a worker that services some kind of request. The worker has two parts: a process that does some work, and a send-only channel that others can use to enqueue work. The channel can be located e.g. at
I really wouldn't overthink CSP, but if you're curious about some of the CS theory, I would recommend Tony Hoare's paper. Note that we don't support the full set of classic CSP operations (in particular, we don't have an equivalent to Go's Beyond that, I would just keep in mind a couple of properties that make CSP such an appealing choice for distributed computing:
Anchors
Yes on all counts. Nit-picking a little bit, it's not technically a file path because this is totally independent of the filesystem. I think a better comparison is a URL path component. It's an abstract path; you neither know nor care about the underlying implementation. You just know that it identifies a particular resource on a particular host. But still, yes.
Good question. There is a method on each anchor called To find the names of all the hosts in a cluster, you would do the following in pseudocode: for host in range root.ls():
print(host) # prints e.g. "/f19caa21dda41c43" From there, you can do Note also:
Correct. Wetware is low-level cluster middleware. In K8s and other PaaS systems, you tend to write declarative configurations and hand these off to some sort of orchestrator/scheduler service. In Wetware, you write programs that can react to changes in the distributed environment. For instance: you can react to a host joining/leaving the cluster, or to an event that has been sent to a channel or published to a topic. You can react to a process terminating by spawning a new one, etc, etc. The trade-off is obviously that Wetware is a bit of a sharp tool, a la C. It's easy to write incorrect programs, because there are few guard-rails. In particular: Wetware is a PA/EL system. However, it hopefully makes it easier for people who know what they're doing to write correct distributed systems, because you're handed convenient primitives and cluster-level services. On a final note, the anchor API is |
Merged, but happy to continue the discussion, whether here or on Matrix. |
This PR implements the basic functionality for the Anchor capability. It addresses https://github.com/wetware/ww/issues/57.
Anchors are a key feature of Wetware's public API, so I am going to use this PR as an opportunity to get our newest contributors up-to-speed on their purpose and implementation. I invite everybody to review this PR, and ask any questions they may have about this important capability.
cc @mikelsr @ayazabbas @evan-schott
A Short Primer on Anchors
What are Anchors?
Anchors are a model for distributed shared-memory. They allow Wetware servers to export a
Chan
orProc
, making it available to peers over the network. At the highest possible level, the Anchor API gives us a way of fetching aChan
orProc
along a particular path, in a particular server.Example
Note the following properties:
Together, these properties allow us to express colocation. A
Chan<-
located at/f19caa21dda41c43/mailbox
lives on the same physical host as the process located at/f19caa21dda41c43/actor
!Why do we need Anchors?
Why do we need distributed shared-memory? Aren't we supposed to use communicating-sequential processes (CSP)?
Yes, applications should use CSP (i.e. channels & processes) as much as possible because it is safer and easier to reason about in a distributed environment. However, applications still need a mechanism for sharing channels with each other! To do so, the application code (think: a WASM process) creates a channel, and then assigns it to a particular path.
In pseudocode:
A completely separate peer can now do the following (pseudocode):
Again, CSP is the preferred way of passing data between processes and peers, and anchors are used as an application-defined rendezvous point where a channel can be shared or obtained. Ideally, applications should only use anchors to propagate channels. In practice, there are cases where it is both safe and convenient to share processes as well, so we allow that too.
The
Anchor
CapabilityAnchors are represented by the
Anchor
capability. The schema is located inapi/anchor.capnp
(note that it is currently incomplete).As we've already seen, Anchors form a tree. The operation
a.Walk("/x/y/z")
operation returns the anchor along the relative path undera
. The operationa.Walk("/")
(or, equivalently,a.Walk("")
) returna
. Nodes are created dynamically when a path is walked, and are automatically removed from the tree when:Importantly, there is no operation operation to walk up the tree, so there is no way for an Anchor to access its parent. This is intentional. It allows applications or subroutines to be confined to a subtree, providing a cooperative mechanism for isolating application state.
From a security standpoint, the cooperative nature of this isolation makes anchors somewhat sensitive. Untrusted code generally should not have access to the root anchor, or even to a host anchor. It should instead be confined to an application-specific sub-anchor, e.g.
/f19caa21dda41c43/my-app
. This is best achieved through a "nursery pattern", where a privileged process is responsible for spawning non-privileged processes, and providing them with the appropriate sub-anchor. Note that I have plans for more fine-grained access control (e.g. read-only anchors, anchors without aWalk
method, etc.), but that these are beyond the scope of this overview.Implementation
Now that we have a general understanding of the purpose and public API for anchors, let us briefly discuss implementation details. This section is should be helpful for understanding the present PR.
The Anchor capability is implemented by
pkg/anchor.Node
. Each node has a counter that tracks references (the "refcount"), which is initially set to zero. Nodes remove themselves from their parent'schildren map[string]*Node
when the refcount decrements from1
to0
.There are exactly two operations that increment the refcount:
n.Child(name)
increments the refcount iff a child under the relative pathname
does not exist.n.Anchor()
increments the refcount iff there is no capnp server running for that node.There are exactly two operations that decrement the refcount:
pkg/anchor.server.Shutdown()
always decrements the refcount ofserver
's underlyingNode
. It is automatically called by the capnp library when the finalAnchor
client for the server has been released.Next Steps
host.Anchor
which models physical hosts (i.e./<server-id
)examples/
directory in project root to demonstrate how proc/chan/anchor compose