Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

future: shard recombination #23

Open
raulk opened this issue Jul 4, 2021 · 0 comments
Open

future: shard recombination #23

raulk opened this issue Jul 4, 2021 · 0 comments
Milestone

Comments

@raulk
Copy link
Member

raulk commented Jul 4, 2021

In Filecoin, deals are natural shards. Deals are considered self contained and atomic. The client cannot rely on any deduplication happening across deals. Deal-mapped shards can be deleted when the deal has expired with no risk at all.

However, in IPFS things are different. IPFS conceives all data to be on the same level, and there are no protocol units that could map to shards. However, the DAG store can be a true game-changer for IPFS, since there is really no requirement nor benefit in using a fully-fledged storage engine to store IPFS data.

Storage engines map poorly to DAG data. DAG data needs to be stored in a modular way. When users add content, they are likely adding meaningful chunks of them that correspond to files, directories, archives, sub-DAGs, etc. Users don't add random, disparate blocks to their local IPFS data. So why treat all blocks as equal (which is what a storage engine does)?

Moreover, users are likely to want to manage the data by dealing with the "meaningful chunks" themselves. Users will want to remove/drop/erase sub-DAGs corresponding to directories, files, etc. When using a storage engine, since blocks are deduplicated, one needs to implement costly refcounting GC algorithms (example, there are many threads and initiatives around IPFS GC) to identify blocks that can be effectively deleted.

That is then coupled with finding ways to forcefully reclaim physical space from the storage engine, which is likely to add tombstone entries but not necessarily release space immediately, as it likely needs to run compactions or other housekeeping processes to integrate the deletes and free up the space.

In a DAG-store world, it would be fairly easy to delete an entire shard. Other operations require shard arithmetic:

  1. deleting partial content from a shard
  2. deleting content across many shards
  3. combining shards
  4. deduplicating content across shards
  5. denormalising content across shards

In the DAG store, shards are immutable. These operations are performed by creating new shards, or reassigning mounts to existing shards. By using mmap and shard indices, one can efficiently perform these joins, splits, regroupings, etc.

@raulk raulk added this to the future milestone Jul 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant