-
Notifications
You must be signed in to change notification settings - Fork 90
Q&A
lukemarsden edited this page Jan 28, 2022
·
3 revisions
Latest Demo:
Current Status:
- Investigate ipfs fuse mounts in order to efficiently mount a dataset inside a firecracker vm running the docker container with the job in it (avoiding any copying)
Questions:
-
Where should we discuss this?
- https://discuss.ipfs.io/
- https://github.com/ipfs
- Please bring those discussions back to this wiki to enable an on-going store of bubbled up knowledg.
-
What are you working on right now?
- We have a project board here: https://github.com/filecoin-project/bacalhau/projects/1
-
Is whether bacalhau is intended to run on top of ipfs storage, or filecoin storage?
- Background:
- To be clear IPFS doesn’t have storage, it is a protocol to transfer and index data.
- Filecoin is the storage (and permanence) story.
- In a very simple way to think of it (IPFS -> Distributed HTTP; Filecoin -> Decentralized S3)
- There are a multitude of efforts to ensure that IPFS can index and interact with Filecoin stored data
- We are building decentralized Lambda (or Fargate, etc..) to operate on decentralized S3
- Bacalhau should focus on interacting with Filecoin, not IPFS
- Background:
-
Can you actually “ignore IPFS completely” for Bacalhau?
- IPFS is composed of libp2p+IPLD+IPFS itself
- Bacalhau will use libp2p to connect with Filecoin nodes, which is the largest chunk of IPFS codebase
- IPLD https://ipld.io/ is what IPFS files and Filecoin Car files are made of, and you can use it to describe versions of datasets that were changed or even the computation itself.
- Unless Bacalhau designs a side-protocol for which distributed the computations to be run, it will need to interact both IPFS and Filecoin
-
Has IPFS thought about Compute Over Data before?
- Yes, many times
- https://www.youtube.com/watch?v=Pjyo2uILcOs&feature=youtu.be
- IPFS-FAN - distributed serverless - https://research.protocol.ai/publications/ipfs-fan-a-function-addressable-computation-network/delarocha2021a.pdf
- IPLS : A Framework for Decentralized Federated Learning- https://arxiv.org/pdf/2101.01901v1.pdf
- Interplanetary Distributed Computing (2018) - https://github.com/yenkuanlee/IPDC
- IPTF - IPFS + TensorFlow (2018) - https://github.com/tesserai/iptf
- Yes, many times
-
In the context of this statement "You will use libp2p to connect with Filecoin nodes, which is the largest chunk of IPFS codebase", how will Bacalhau know which node to execute compute on?
- Specifically:
- Which blocks have the (most/all) of the data?
- How it will get at the plaintext (i.e. unsealed) versions of those files?
- OPEN Could we allow nodes to hear about all jobs and then ignore the ones they don't want to take on aka self selection?
- OPEN Do we envision that we are using libp2p to get the contents of files needed by a job or just to ask which nodes have that file?
- OPEN How does a bacalhau node knows if it has the data locally and what api it will use to know?
- Specifically:
-
What actually runs a Bacalhau process?
- Current assumption: Bacalhau runs on the same machine as lotus and so we can get at plain-text versions of the files on disk without having to copy them across the network somehow
-
Should we interact with FUSE?
- Fuse has been historically a PITA, but there is a “working” integration
- Here is a demo of mounting IPFS MFS (just a virtual filesystem inside an IPFS node) with fuse https://www.youtube.com/watch?v=jXkTEBdM6aA
- Some relevant threads in Github:
- Additionally Will Scott has done work integrating IPFS & NFS https://github.com/willscott/go-nfs
-
How do you run jobs in a "disconnected" mode, where you're separated from the main chain?
- https://cluster.ipfs.io, which is not designed for computation for data storage acros multiple nodes and has a bunch of primitives you can reuse (RPC, Raft Consensus over libp2p and other things)