Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automatic sharding/unsharding tests #8547

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
test: add tests for automatic sharding and unsharding
aschmahmann committed Nov 16, 2021
commit 927fad8aa404de0d6fa4a13aae0444e97e420ccf
67 changes: 66 additions & 1 deletion test/sharness/t0260-sharding.sh
Original file line number Diff line number Diff line change
@@ -8,12 +8,25 @@ test_description="Test directory sharding"

. lib/test-lib.sh

# We shard based on size with a threshold of 256 KiB (see config file docs)
# above which directories are sharded.
#
# The directory size is estimated as the size of each link. Links are roughly
# the entry name + the CID byte length (e.g. 34 bytes for a CIDv0). So for
# entries of length 10 we need 256 KiB / (34 + 10) ~ 6000 entries in the
# directory to trigger sharding.
test_expect_success "set up test data" '
mkdir testdata
for i in `seq 2000`
do
echo $i > testdata/file$i
done

mkdir big_dir
for i in `seq 5960` # Just above the number of entries that trigger sharding for 256KiB
do
echo $i > big_dir/`printf "file%06d" $i` # fixed length of 10 chars
done
schomatis marked this conversation as resolved.
Show resolved Hide resolved
'

test_add_dir() {
@@ -32,6 +45,11 @@ test_add_dir() {
test_init_ipfs

UNSHARDED="QmavrTrQG4VhoJmantURAYuw3bowq3E2WcvP36NRQDAC1N"

test_expect_success "force sharding off" '
ipfs config --json Internal.UnixFSShardingSizeThreshold "\"1G\""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0 value disables the options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya that's fair although, I'm not sure how I feel about 0 here. I'm not even sure if when go-ipfs processes the config file it should accept 0 as a value. It restores behavior to the go-ipfs <v0.10.0 defaults, but I'm not sure it's obvious to reason about (i.e. the feature is turned off which means data is add as a basic directory and no automatic sharding or unsharding of existing MFS directories).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem, just flagging that theoretically at least we expose a "disabled" option in UnixFS, but 1G is more than enough to consider it disabled for all intended purposes

'

test_add_dir "$UNSHARDED"

test_launch_ipfs_daemon
@@ -40,7 +58,7 @@ test_add_dir "$UNSHARDED"

test_kill_ipfs_daemon

test_expect_success "enable sharding" '
test_expect_success "force sharding on" '
ipfs config --json Internal.UnixFSShardingSizeThreshold "\"1B\""
'

@@ -137,4 +155,51 @@ test_list_incomplete_dir() {

test_list_incomplete_dir

# Test automatic sharding and unsharding
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed informally in chat (but can elaborate more here if needed) any kind of directory manipulation goes through the same "automatic" sharding mechanism, whether it is adding/removing one entry at a time from an MFS tree with ipfs files add/rm, or consuming an entire directory at once with ipfs add -r <big-dir>.

So above and below this line we are testing the same UnixFS code. This doesn't mean that the current division is superfluous as we're accessing that mechanism through different go-ipfs paths but I'm mentioning this to (a) check if this was indeed the intention here and (b) think if we could re-label this boundary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but in the event the code paths were different we'd be testing different things. i.e. here we are testing that when we remove data from a sharded directory we get an unsharded one and when we add the data back we end up with a sharded one again.

Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. I think my main issue here is with the "automatic" boundary, which might imply the previous commands were not, that they didn't use the new UnixFS mechanism and instead were forced or explicitly stated to shard (where in fact we calculated the exact entries number that would go over the threshold and trigger the mechanism). I think your previous explanation is more clear (but still we're in the nit realm so we can ignore this):

here we are testing that when we remove data from a sharded directory we get an unsharded one and when we add the data back we end up with [the same] sharded one again


# TODO: This does not need to report an error https://github.com/ipfs/go-ipfs/issues/8088
test_expect_failure "reset automatic sharding" '
ipfs config --json Internal.UnixFSShardingSizeThreshold null
'

LARGE_SHARDED="QmWfjnRWRvdvYezQWnfbvrvY7JjrpevsE9cato1x76UqGr"
LARGE_MINUS_5_UNSHAREDED="QmbVxi5zDdzytrjdufUejM92JsWj8wGVmukk6tiPce3p1m"

test_add_large_sharded_dir() {
exphash="$1"
test_expect_success "ipfs add on directory succeeds" '
ipfs add -r -Q big_dir > shardbigdir_out &&
echo "$exphash" > shardbigdir_exp &&
test_cmp shardbigdir_exp shardbigdir_out
'

test_expect_success "can access a path under the dir" '
ipfs cat "$exphash/file000030" > file30_out &&
test_cmp big_dir/file000030 file30_out
'
}

test_add_large_sharded_dir "$LARGE_SHARDED"

test_expect_success "remove a few entries from big_dir/ to trigger unsharding" '
ipfs files cp /ipfs/"$LARGE_SHARDED" /big_dir &&
for i in `seq 5`
do
ipfs files rm /big_dir/`printf "file%06d" $i`
done &&
ipfs files stat --hash /big_dir > unshard_dir_hash &&
echo "$LARGE_MINUS_5_UNSHAREDED" > unshard_exp &&
test_cmp unshard_exp unshard_dir_hash
'

test_expect_success "add a few entries to big_dir/ to retrigger sharding" '
for i in `seq 5`
do
ipfs files cp /ipfs/"$LARGE_SHARDED"/`printf "file%06d" $i` /big_dir/`printf "file%06d" $i`
done &&
ipfs files stat --hash /big_dir > shard_dir_hash &&
echo "$LARGE_SHARDED" > shard_exp &&
test_cmp shard_exp shard_dir_hash
'

test_done