Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for UnixFS automatic sharding #8106

Closed
Tracked by #8343
aschmahmann opened this issue May 4, 2021 · 13 comments
Closed
Tracked by #8343

Tracking issue for UnixFS automatic sharding #8106

aschmahmann opened this issue May 4, 2021 · 13 comments
Assignees
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/in-progress In progress
Milestone

Comments

@aschmahmann
Copy link
Contributor

aschmahmann commented May 4, 2021

Update 10/21: All major work has been finished. Full review pending.

The sharding PR has landed and we're working on the reverse transition of "unsharding". We haven't merged it in go-ipfs (#8114) until we finish with the unsharding to have the whole functionality together (and ensure deterministic CIDs for any directory size).

Main unsharding work happening in ipfs/go-unixfs#94.

Minor issues left for last compiled in ipfs/go-unixfs#105 can be addressed after main PRs land.

Potential issue with merkledag.Walk function doing a BFS in ipfs/boxo#392, but this can be address after landing the major work as it's only an optimization.


Tracking implementation of #7022

Per the linked issue in order to enable sharded directories by default we want to shard the directories when required.

After some discussion the heuristic we are going to use for now to do determine when to use sharded vs regular directories is option three listed here. That is we will sum the sizes of all the names + CIDs in the map and if they are >256KiB we will use sharded directories and if they are <=256KiB we will use regular directories.

Places where we'll likely have to look at changes include:

  • go-unixfs (the importer and directory code as well as potentially the helper code)
  • go-mfs (wherever we modify a folder)

Hopefully if we keep the interfaces the same here we won't have to make any changes in go-ipfs itself. We may need to do a scan to see if we do any type casting to a UnixFS basic or sharded directory though just in case.

Once this is done we should be able to drop the global boolean that enables "use sharded directories" from both go-unixfs, go-ipfs, and go-ipfs-config.

@aschmahmann aschmahmann added the kind/enhancement A net-new feature or improvement to an existing feature label May 4, 2021
@schomatis schomatis self-assigned this May 4, 2021
@schomatis
Copy link
Contributor

Scoped issue with technical details in ipfs/go-mfs#87 regarding the MFS/UnixFS enhancement to support this. Once that is done go-ipfs should only need to update the dependencies and set the new option with the desired 256KiB value.

@Stebalien
Copy link
Member

Now that auto-sharding is implemented, let's try implement auto-unsharding. As far as I can tell, this shouldn't be too difficult and shouldn't be a massive performance problem (TBD). The tricky parts are:

  1. We don't want to do any size estimation unless we actually delete something.
  2. Ideally, we'd only do size estimations when we serialize.

Potential solution:

  1. When making changes in a sharded directory, keep track of the net size change.
  2. On serialization, if the net size change is negative, enumerate until we hit the limit.

Additional notes (possible future extensions):

  • We may want to try EnumLinksAsync to enumerate in parallel, but this may actually have worse performance because we might try sampling different parts of the graph each time (non-deterministic). It may actually be better to just enumerate links sequentially, especially because we likely only need to fetch maybe ~10 leafs (would need to be computed). Alternatively, it may be worth it to try to make EnumLinksAsync more depth-first instead of breadth-first.
  • We may want to count over the limit (e.g., by 2x). We could use this to "offset" our net size change running total to avoid re-computing this. But this probably isn't worth it for a first-pass.

@schomatis
Copy link
Contributor

We still need to integrate this into go-ipfs (draft in #8114), namely fixing sharness and interop tests.

@schomatis schomatis removed their assignment May 7, 2021
@BigLep
Copy link
Contributor

BigLep commented May 10, 2021

2021-05-10 discussion:

  1. @schomatis is going to do the unsharding work week of 2021-05-17
  2. @mburns is going to do a test fix here. @schomatis or @aschmahmann can provide the details.

@schomatis
Copy link
Contributor

More details in the TODO list of the ongoing PR #8114.

@schomatis
Copy link
Contributor

Update 8/13 (@schomatis): I'm leading this effort currently in progress. Not blocked.

@BigLep BigLep modified the milestones: go-ipfs 0.10, go-ipfs 0.11 Aug 14, 2021
@schomatis
Copy link
Contributor

@schomatis
Copy link
Contributor

@schomatis
Copy link
Contributor

Update 8/23 (@schomatis, DRI):

Brief:

(See full status in the OP.)

@schomatis
Copy link
Contributor

Update 8/27 (@schomatis, DRI):

Brief:

(See full status in PR description.)

@schomatis
Copy link
Contributor

Update 9/3 (@schomatis, DRI):

Brief:

(See full status in PR description.)

@schomatis
Copy link
Contributor

Update 10/21: All major work has been finished. Full review pending.

The sharding PR has landed and we're working on the reverse transition of "unsharding". We haven't merged it in go-ipfs (#8114) until we finish with the unsharding to have the whole functionality together (and ensure deterministic CIDs for any directory size).

Main unsharding work happening in ipfs/go-unixfs#94. Test for this are in a separate PR: ipfs/go-unixfs#99

Minor issues left for last compiled in ipfs/go-unixfs#105 can be addressed after main PRs land.

Potential issue with merkledag.Walk function doing a BFS in ipfs/boxo#392, but this can be address after landing the major work as it's only an optimization.

@aschmahmann
Copy link
Contributor Author

Closed by #8563

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/in-progress In progress
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants