Skip to content

Bacalhau project report 20220830

lukemarsden edited this page Aug 30, 2022 · 3 revisions

Nearly at the end of Master Plan - Part 1!

Filecoin Support - Read Side

The feedback we've had from Storage Providers is that they often have unsealed copies of the data locally for fast retrieval. If those SPs are also participating in the Bacalhau network, it would be great (and a competitive advantage for them!) if they could enable compute over those unsealed copies of the data they have local access to. Today we're enabling this with Bacalhau, with new capabilities!

The filecoin unsealed driver allows Storage Providers to resolve a local filepath using a go template that is passed the storage spec as an input. This can point to a local directory, an NFS share or any other such strategy that works with a directory path as the interface, for example /var/lib/data/{{.Cid}} could be used as the template.

The combo driver allows multiple storage drivers to be in contention for use by a single job. This means we can have a single storage provider that will give preference to the filecoin unsealed provider if the CID is local and fallback to the IPFS provider if not.

There is a new CLI option for bacalhau serve --filecoin-unsealed-path /var/lib/data/{{.Cid}} which allows the path template for the filecoin unsealed provider to be configured.

This means that storage providers can now bid on jobs they hold the unsealed data for.

Deterministic Verification!

We now have a --verifier=deterministic flag that can be passed to Bacalhau jobs. This is a step towards -- and a necessary component of -- the verification protocol that we plan to implement in the future.

  • The following CLI flags affect verification
    • --verifier deterministic = use the deterministic verifier for this job
    • --concurrency 5 = each shard must be run by 5 different nodes
    • --confidence 4 = each shard must have at least 4 entries that agree to be verified
  • The verification proposal is a hash of the results directory
    • We re-use the same code as go mod to calculate the hash of the results directory (so it's as fast as it can be). We can't just use the CID because we don't want to assume we're using a CID-generating publisher.
  • The libp2p transport now has Encrypt and Decrypt functions
    • The Encrypt function requires the public key of the requester node (which is not this node)
    • The Decrypt function uses the private key of the transport (which IS this node)
  • The public key of the message sender is now included in job events
    • Using this, the public key of the requester node is now added to the Job data
  • The verifier accepts encrypter and decrypter functions that call the respective transport methods
    • The verification proposal is encrypted using the public key of the requester node
  • When the requster node decides it has enough verification proposals to perforn the verification it
    • Decrypts all of the verification proposals using the requester node private key (via the transport Decrypt method)
    • Groups the shard results by a) shard index b) proposed hash
    • Works through each shard and pick the "winning group" which is defined as the largest group with the same hash
  • Exceptions:
    • There must be >1 concurrency - i.e. each group must have more than 1 entry
    • There cannot be a draw - i.e. if there are 2 entries for each hash neither can be verified
    • The winning group must have at least confidence entries
    • The winning group must not be for the hash that is "empty string"

With the deterministic verifier, once a job has completed - we calculate a hash locally of the results folder. The job spec now has a field that holds the requesters public key, the compute node encrypts the hash using the requesters public key, and the encrypted hash is submitted as the verification proposal. The requester node will decrypt all hashes, and the largest group of equal hashes are marked as "verified".

So, now users can run a job across several nodes and get a level of confidence that they are getting valid results! NOTE: this assumes the user's jobs are actually deterministic, which the WASM executor can help with, but we are not enforcing that all jobs must use the WASM executor in order to attempt deterministic verification.

Spreading Algorithm

As part of the future work to prevent Sybil attacks, we will need a way for verification jobs to be checked against a sample of the network which is sampled randomly, i.e. "evenly spread". In order to do this, we are adding a new field to the JobSpec which can be controlled by a --min-bids parameter. This parameter in a JobSpec requires the requestor node to wait until this many bids have come in, before issuing bid acceptances to concurrency-many jobs. This is an important step towards being able to dissuade malicious actors on the network.

Security Work

We have determined two threat models so far: a malicious client spamming the public API of the nodes, which we'll solve with rate limiting at the REST level, and a malicious requestor node (or compute node) spamming the libp2p transport, which we'll solve with rate limiting at the libp2p level.

What's Next

  • Filecoin+ integration
  • Lotus publisher (for filecoin writes)
  • Estuary publisher (for filecoin writes via the Estuary gateway)

And then we'll be done with this stage of the project, and moving on to the Master Plan - Part 2 🎉

Clone this wiki locally