-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky MFS sharness test #8131
Comments
Looking deeper into the surrounding tests. Trying to understand why are we looking for a CIDv0 hash in a test where v1 is enabled (both by the raw leaves flag and the explicit cid version flag):
|
Trying to reproduce a minimal subset of the test that failed. Right now the test is consistenly failing which means I'm likely doing something wrong as this normally passes in sharness (the issue is just a sporadic fail). Failing right now to avoid flushing the write: #!/bin/bash
set -x
set -e
# The daemon needs to be running, otherwise we will always flush on
# each command (that closes the MFS file descriptors triggering
# the sync).
ipfs swarm addrs > /dev/null || (echo "daemon not running" && exit 1)
# FIXME: Is there a nice way to check this?
HASH=QmNzVQoBR7wQjSNXFrcJHZ29PMsEDfF6iZB1QEhKD4uZpV
# Test if $HASH is present ($1=1) or absent ($1=0) in the repo
# and fail with error string $2 if not.
test_hash() {
local IS_PRESENT ERROR_STR IN_REPO
IS_PRESENT=$1
ERROR_STR=$2
IN_REPO=1
# Check if present or not with grep, if grep fails it will
# leave the IN_REPO as is, otherwise it will clear it.
ipfs refs local | grep $HASH || IN_REPO=0
if [[ $IN_REPO != $IS_PRESENT ]]; then
echo $ERROR_STR
exit 1
fi
}
# Clean previous run.
ipfs files rm -rf /cats || true
ipfs repo gc > /dev/null
test_hash 0 "hash is present before write"
# Sharness test write.
ipfs files mkdir -p /cats
echo "testing" | ipfs files write -f=false -e /cats/walrus
test_hash 0 "hash is present after write with no flush (is the daemon running?)"
ipfs files stat --hash /cats/walrus # stat flushes
test_hash 1 "hash is not present after stat"
echo "SUCCESS"
exit 0
|
Fixed previous test (in WIP branch linked at the top). The file is always added to the repo on close, what changes with the flush flag is whether the parents are modified or not. Their hash is what should be tested for presence in the repo. |
Update 9/6 (@schomatis). This issue is back on my radar although I'm not sure if there is much more value here other than what has already been noted. We probably want to proceed with #8311 instead. |
Update 9/13 (@schomatis): The summary of this flaky test (which we haven't seen failing again nor have we been able to reproduce) is that the source of the fail was The reason for the command running locally instead of remotely is unknown but likely one of the following:
|
Not much to be done at the moment so un-assigning myself to signal there is no active work being done here. |
Update 9/13 (@schomatis): Not much to be done at the moment. The summary of this flaky test (which we haven't seen failing again nor have we been able to reproduce) is that the source of the fail was
ipfs files write --flush=false
running locally instead of remotely through the daemon. In that scenario the flush flag is virtually ignored as we end up closing the repo after the command which will trigger a full flush from the MFS root anyway.The reason for the command running locally instead of remotely is unknown but likely one of the following:
test_launch_ipfs_daemon
andpollEndpoint
more reliable #8311, but this is hard to implement and with low return. As a workaround to this we have added an extra check specifically for the failing test to poll the daemon's API and make sure it's running right before calling theipfs files
command (fix(sharness): add extra check in flush=false in files write #8432). That check should be enough to uncover this potential source if it happens again (for this particular test alone).go-ipfs-cmds
failed to detect it (as explained Maketest_launch_ipfs_daemon
andpollEndpoint
more reliable #8311 (comment), it has a different detection mechanism). In that casego-ipfs-cmds
silently calls the command locally and we fail to notice the difference. There is no way to detect this at the moment so Allow NoRemote/NoLocal to be configured on a per-run basis go-ipfs-cmds#218 was created to add more granularity to the local/remote run choice and forcego-ipfs-cmds
to run the command remotely or fail otherwise in the specific case of the--flush=false
flag being set.Update 9/6 (@schomatis). This issue is back on my radar although I'm not sure if there is much more value here other than what has already been noted. We probably want to proceed with #8311 instead.
Update 7/14 (@schomatis): My best guess so far is that this was just a race bug in how the daemon is set up in
test_launch_ipfs_daemon
which caused the flush flag to be run in a standalone/direct way without the daemon running and hence trigger the flush even when we didn't intended to.GetNode()
andsync()
on each child recursively):https://github.com/ipfs/go-ipfs/blob/ae306994a5254b2d0fc8a7a63ca2e8695c5f745d/core/node/core.go#L159
We need a way to tell in which way we are running the command (standalone or through deamon) as the commands might have a different behavior. The
flush=false
should fail from the start if the write command is run in standalone mode.Trying to isolate the offending test. There is too much entangling so I need to go deeper on the surrounding tests of the entire
test_files_api
(the next test usually depends on the state left by the previous one).WIP test branch https://github.com/ipfs/go-ipfs/commits/schomatis/sharness/flaky-mfs-test.
Shareness test
1130 - root hash not bubbled up yet (offline, cidv1) - sharnessLinux.t0250-files-api
failed in this CI run https://app.circleci.com/pipelines/github/ipfs/go-ipfs/4611/workflows/e5849e96-e5cf-4df3-bb50-2b4517f43d24/jobs/51165/tests#failed-test-0.It appears to be a flaky test related to MFS and using
--flush=false
.The text was updated successfully, but these errors were encountered: