Skip to content

Latest commit

 

History

History
128 lines (77 loc) · 6.2 KB

REPO.md

File metadata and controls

128 lines (77 loc) · 6.2 KB

IPFS Repo Spec

Author(s):


Abstract

This spec defines an IPFS Repo, its contents, and its interface. It does not specify how the repo data is actually stored, as that is done via swappable implementations.

Table of Contents

TODO

Definition

A repo is the storage repository of an IPFS node. It is the subsystem that actually stores the data IPFS nodes use. All IPFS objects are stored in a repo (similar to git).

There are many possible repo implementations, depending on the storage media used. Most commonly, IPFS nodes use an fs-repo.

Repo Implementations:

  • fs-repo - stored in the os filesystem
  • mem-repo - stored in process memory
  • s3-repo - stored in amazon s3

Repo Contents

The Repo stores a collection of IPLD objects that represent:

  • config - node configuration and settings
  • datastore - content stored locally, and indexing data
  • keystore - cryptographic keys, including node's identity
  • hooks - scripts to run at predefined times (not yet implemented)

Note that the IPLD objects a repo stores are divided into:

  • state (system, control plane) used for the node's internal state
  • content (userland, data plane) which represent the user's cached and pinned data.

Additionally, the repo state must determine the following. These need not be IPLD objects, though it is of course encouraged:

  • version - the repo version, required for safe migrations
  • locks - process semaphores for correct concurrent access
  • datastore_spec - array of mounting points and their properties

Finally, the repo also stores the blocks with blobs containing binary data.

version

Repo implementations may change over time, thus they MUST include a version recognizable across versions. Meaning that a tool MUST be able to read the version of a given repo type.

For example, the fs-repo simply includes a version file with the version number. This way, the repo contents can evolve over time but the version remains readable the same way across versions.

datastore

IPFS nodes store some IPLD objects locally. These are either (a) state objects required for local operation -- such as the config and keys -- or (b) content objects used to represent data locally available. Content objects are either pinned (stored until they are unpinned) or cached (stored until the next repo garbage collection).

The name "datastore" comes from go-datastore, a library for swappable key-value stores. Like its name-sake, some repo implementations feature swappable datastores, for example:

  • an fs-repo with a leveldb datastore
  • an fs-repo with a boltdb datastore
  • an fs-repo with a union fs and leveldb datastore
  • an fs-repo with an s3 datastore
  • an s3-repo with a cached fs and s3 datastore

This makes it easy to change properties or performance characteristics of a repo without an entirely new implementation.

keystore

A Repo typically holds the keys a node has access to, for signing and for encryption.

Details on operation and storage of the keystore can be found in REPO_FS.md and KEYSTORE.md.

config (state)

The node's config (configuration) is a tree of variables, used to configure various aspects of operation. For example:

  • the set of bootstrap peers IPFS uses to connect to the network
  • the Swarm, API, and Gateway network listen addresses
  • the Datastore configuration regarding the construction and operation of the on-disk storage system.

There is a set of properties, which are mandatory for the repo usage. Those are Addresses, Discovery, Bootstrap, Identity, Datastore and Keychain.

It is recommended that config files avoid identifying information, so that they may be re-shared across multiple nodes.

CHANGES: today, implementations like js-ipfs and go-ipfs store the peer-id and private key directly in the config. These will be removed and moved out.

locks

IPFS implementations may use multiple processes, or may disallow multiple processes from using the same repo simultaneously. Others may disallow using the same repo but may allow sharing datastores simultaneously. This synchronization is accomplished via locks.

All repos contain the following standard locks:

  • repo.lock - prevents concurrent access to the repo. Must be held to read or write.

datastore_spec

This file is created according to the Datastore configuration specified in the config file. It contains an array with all the mounting points that the repo is using, as well as its properties. This way, the datastore_spec file must have the same mounting points as defined in the Datastore configuration.

It is important pointing out that the Datastore in config must have a Spec property, which defines the structure of the ipfs datastore. It is a composable structure, where each datastore is represented by a json object.

hooks (TODO)

Like git, IPFS nodes will allow hooks, a set of user configurable scripts to run at predefined moments in IPFS operations. This makes it easy to customize the behavior of IPFS nodes without changing the implementations themselves.

Notes

A Repo uniquely identifies an IPFS Node

A repository uniquely identifies a node. Running two different IPFS programs with identical repositories -- and thus identical identities -- WILL cause problems.

Datastores MAY be shared -- with proper synchronization -- though note that sharing datastore access MAY erode privacy.

Repo implementation changes MUST include migrations

DO NOT BREAK USERS' DATA. This is critical. Thus, any changes to a repo's implementation MUST be accompanied by a SAFE migration tool.

See ipfs/kubo#537 and jbenet/random-ideas#33

Repo Versioning

A repo version is a single incrementing integer. All versions are considered non-compatible. Repos of different versions MUST be run through the appropriate migration tools before use.