Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky MFS sharness test #8131

Open
aschmahmann opened this issue May 12, 2021 · 6 comments
Open

Flaky MFS sharness test #8131

aschmahmann opened this issue May 12, 2021 · 6 comments
Labels
kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked

Comments

@aschmahmann
Copy link
Contributor

aschmahmann commented May 12, 2021

Update 9/13 (@schomatis): Not much to be done at the moment. The summary of this flaky test (which we haven't seen failing again nor have we been able to reproduce) is that the source of the fail was ipfs files write --flush=false running locally instead of remotely through the daemon. In that scenario the flush flag is virtually ignored as we end up closing the repo after the command which will trigger a full flush from the MFS root anyway.

The reason for the command running locally instead of remotely is unknown but likely one of the following:

Update 9/6 (@schomatis). This issue is back on my radar although I'm not sure if there is much more value here other than what has already been noted. We probably want to proceed with #8311 instead.

Update 7/14 (@schomatis): My best guess so far is that this was just a race bug in how the daemon is set up in test_launch_ipfs_daemon which caused the flush flag to be run in a standalone/direct way without the daemon running and hence trigger the flush even when we didn't intended to.

We need a way to tell in which way we are running the command (standalone or through deamon) as the commands might have a different behavior. The flush=false should fail from the start if the write command is run in standalone mode.

Trying to isolate the offending test. There is too much entangling so I need to go deeper on the surrounding tests of the entire test_files_api (the next test usually depends on the state left by the previous one).

WIP test branch https://github.com/ipfs/go-ipfs/commits/schomatis/sharness/flaky-mfs-test.


Shareness test 1130 - root hash not bubbled up yet (offline, cidv1) - sharnessLinux.t0250-files-api failed in this CI run https://app.circleci.com/pipelines/github/ipfs/go-ipfs/4611/workflows/e5849e96-e5cf-4df3-bb50-2b4517f43d24/jobs/51165/tests#failed-test-0.

It appears to be a flaky test related to MFS and using --flush=false.

@aschmahmann aschmahmann added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels May 12, 2021
@aschmahmann aschmahmann added status/ready Ready to be worked P2 Medium: Good to have, but can wait until someone steps up and removed need/triage Needs initial labeling and prioritization labels Jun 25, 2021
@schomatis schomatis self-assigned this Jul 2, 2021
@schomatis
Copy link
Contributor

schomatis commented Jul 6, 2021

Looking deeper into the surrounding tests. Trying to understand why are we looking for a CIDv0 hash in a test where v1 is enabled (both by the raw leaves flag and the explicit cid version flag):

expecting success: 
    test -z "$ONLINE" ||
    (ipfs refs local > refsout &&
    test_expect_code 1 grep $ROOT_HASH refsout)
  
QmageRWxC7wWjPv5p36NeAgBAiFdBHaNfxAehBSwzNech2
test_expect_code: command exited with 0, we wanted 1 grep QmageRWxC7wWjPv5p36NeAgBAiFdBHaNfxAehBSwzNech2 refsout
not ok 1130 - root hash not bubbled up yet (offline, cidv1)
#	
#	    test -z "$ONLINE" ||
#	    (ipfs refs local > refsout &&
#	    test_expect_code 1 grep $ROOT_HASH refsout)
#	  

@schomatis
Copy link
Contributor

Trying to reproduce a minimal subset of the test that failed. Right now the test is consistenly failing which means I'm likely doing something wrong as this normally passes in sharness (the issue is just a sporadic fail).

Failing right now to avoid flushing the write: hash is present after write with no flush.

#!/bin/bash
set -x
set -e

# The daemon needs to be running, otherwise we will always flush on
# each command (that closes the MFS file descriptors triggering
# the sync).
ipfs swarm addrs > /dev/null || (echo "daemon not running" && exit 1)
# FIXME: Is there a nice way to check this?

HASH=QmNzVQoBR7wQjSNXFrcJHZ29PMsEDfF6iZB1QEhKD4uZpV

# Test if $HASH is present ($1=1) or absent ($1=0) in the repo
# and fail with error string $2 if not.
test_hash() {
  local IS_PRESENT ERROR_STR IN_REPO
  IS_PRESENT=$1
  ERROR_STR=$2

  IN_REPO=1
  # Check if present or not with grep, if grep fails it will
  # leave the IN_REPO as is, otherwise it will clear it.
  ipfs refs local | grep $HASH || IN_REPO=0

  if [[ $IN_REPO != $IS_PRESENT ]]; then
    echo $ERROR_STR
    exit 1
  fi
}

# Clean previous run.
ipfs files rm -rf /cats || true
ipfs repo gc > /dev/null

test_hash 0 "hash is present before write"

# Sharness test write.
ipfs files mkdir -p /cats
echo "testing" | ipfs files write -f=false -e /cats/walrus

test_hash 0 "hash is present after write with no flush (is the daemon running?)"

ipfs files stat --hash /cats/walrus # stat flushes

test_hash 1 "hash is not present after stat"

echo "SUCCESS"
exit 0
ipfs version --all
go-ipfs version: 0.10.0-dev-ae306994a-dirty
Repo version: 11
System version: amd64/linux
Golang version: go1.16.5

@schomatis
Copy link
Contributor

Fixed previous test (in WIP branch linked at the top). The file is always added to the repo on close, what changes with the flush flag is whether the parents are modified or not. Their hash is what should be tested for presence in the repo.

@schomatis
Copy link
Contributor

Update 9/6 (@schomatis). This issue is back on my radar although I'm not sure if there is much more value here other than what has already been noted. We probably want to proceed with #8311 instead.

@schomatis
Copy link
Contributor

Update 9/13 (@schomatis): The summary of this flaky test (which we haven't seen failing again nor have we been able to reproduce) is that the source of the fail was ipfs files write --flush=false running locally instead of remotely through the daemon. In that scenario the flush flag is virtually ignored as we end up closing the repo after the command which will trigger a full flush from the MFS root anyway.

The reason for the command running locally instead of remotely is unknown but likely one of the following:

@schomatis
Copy link
Contributor

Not much to be done at the moment so un-assigning myself to signal there is no active work being done here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked
Projects
None yet
Development

No branches or pull requests

2 participants