Skip to content

Bacalhau project report 20220610

lukemarsden edited this page Jun 11, 2022 · 3 revisions

Steady progress

A week of organizational work and steady progress this week, with quite a few things in progress but not much fully finished by Friday.

Fixes to the production network

Early this week, I spent some time fixing issues on the production network, upgrading production to REST, configuring IPFS to persist data to the persistent disk that survives upgrades, and fixing various deployment-related issues.

The team is growing!

With Guy, Kai, Luke, Dave and Vedant all pushing code to Bacalhau, the team is getting bigger! With that in mind, we are starting to adopt more process than was needed when it was just Kai and I hacking together. To that end...

Kanban board

Management of the project has moved out of Google Docs and into the GitHub Kanban board, which is now our project Kanban board for execution against the Master Plan - Part 1.

If you are working on the project, please ensure what you’re working on is tracked in the kanban. We plan to keep the “Milestones from Plan” column clean with just the milestones from the plan in it - but feel free to add any other issues to To Do / In Progress.

This means that we can see at a glance who's working on what, and ensure that everyone has well formed work items with acceptance criteria.

CLI improvements

Dave is working on CLI improvements for things that became obviously necessary once the production network was deployed. This led to sharing our approach to testing more widely within the team. One day we should start doing tech talks!

Otel and contexts

Guy is making excellent progress towards getting end-to-end OpenTelemetry in place. This required redoing the way we handle contexts in golang, which resulted in a regression which was rapidly dealt with. We are now very close to having end-to-end telemetry across CLI and server! We'll then be able to get better observability into the production nodes that we are running.

Logging improvements

We have a much improved logging library that allows any combination of text, JSON and job event logs to be emitted.

Allow access to all data in IPFS

We are working on enabling backends to be configured to download from IPFS rather than self-selecting only data they have locally. This will enable folks to run against arbitrary IPFS-accessible datasets, which will dramatically improve the utility of the public network.

The selfSelectionRules feature (which allows a compute node to describe which jobs it wants to take on and will form the basis for external hooks to make these decisions) is 90% plugged in.

Vedant

Vedant Padwal has started as an intern, and is researching Bacalhau use cases. I've worked with Vedant before on other projects and I'm happy he's helping out!

Prioritizing Filecoin integration

Based on feedback that the SPs are interested in running Bacalhau, we have moved Filecoin Integration milestones up to just after the first Performance phase in the kanban https://github.com/filecoin-project/bacalhau/projects/7

What's next

  • Finish everything that's in-flight!
  • Basic monitoring of CPU/mem/disk usage on the production nodes
  • Uptime alerting on our production endpoints
  • Python FaaS WASM beta
  • More detailed planning around hitting our various milestones in time for October!
Clone this wiki locally