Intermittent coverage failures (some test runs not counted) #91092

scole66 · 2021-11-20T20:25:40Z

I have a large-ish project with ~6000 testcases, and for a long time now, code coverage has been hit-or-miss. The issue is that some of the test cases don't seem to get their data included in the profraw files, and so don't show up as having an effect on coverage. The unrecorded tests seem to be essentially random, but with thousands of tests, single-digit-percentage failures are noticed on every test run.

It's been annoying. So I finally sat down to try and reduce to a simplest error, but it still takes multiple files, so is difficult to include inline in a GitHub issue.

The tree:

.
├── Cargo.lock
├── Cargo.toml
├── check_bug.sh
└── src
    ├── main.rs
    └── statething.rs

When I run a test via:

tst ()
{
    rm -f res-*.profraw;
    RUST_BACKTRACE=1 RUSTFLAGS="-Zinstrument-coverage" LLVM_PROFILE_FILE="res-%m.profraw" cargo test "$@";
    cargo profdata -- merge res-*.profraw --output=res.profdata
}
tst statething

which shows 2 tests run:

    Finished test [unoptimized + debuginfo] target(s) in 0.00s
     Running unittests (target/debug/deps/res-6177536c73daf6cd)

running 2 tests
test statething::tests::state_has_false ... ok
test statething::tests::state_has_true ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Then report on that particular function via:

report ()
{
    cargo cov -- show --use-color --ignore-filename-regex='/rustc/|/\.cargo/|\.rustup/toolchains' --instr-profile=res.profdata $(objects) --show-line-counts-or-regions -Xdemangler=rustfilt "$@"
}
report --name 5State3has

Most of the time, I see the correct result:

<res::statething::State>::has:
    8|      2|    pub fn has(&self, needle: &str) -> bool {
    9|      2|        self.0 == needle
   10|      2|    }

But sometimes (about 1 in 170 times), I see this:

<res::statething::State>::has:
    8|      1|    pub fn has(&self, needle: &str) -> bool {
    9|      1|        self.0 == needle
   10|      1|    }

It seems to be related to having multiple source files; I could not get similar behavior with only a main.rs. I've seen the problem appear on both MacOS (Mohave) and on Windows (in Windows Subsystem for Linux 2)

The two-source-file tree mentioned up above (including the script I run to repeat the test until a failure happens), is on a bug-report branch here: https://github.com/scole66/rust-e262/tree/reduction-for-bugreport

This really feels like whichever thread is controlling writes to the profraw file is missing messages. Queue overrun maybe? (I haven't looked.)

Meta

rustc --version --verbose:

rustc 1.58.0-nightly (a77da2d45 2021-11-19)
binary: rustc
commit-hash: a77da2d454e6caa227a85b16410b95f93495e7e0
commit-date: 2021-11-19
host: x86_64-unknown-linux-gnu
release: 1.58.0-nightly
LLVM version: 13.0.0

The text was updated successfully, but these errors were encountered:

scole66 · 2021-11-20T20:29:41Z

@rustbot label A-code-coverage

Swatinem · 2021-12-07T15:09:45Z

I see you are using the %m placeholder for the profile file, which talks about locking and merging these files at runtime. Maybe adding the %p PID placeholder as well might avoid conflicts here. Can you give that a try?

scole66 · 2021-12-07T16:17:40Z

Gave it a try; still exhibits the same behavior.

scole66 · 2021-12-07T16:24:15Z

%m is definitely an indication to "please merge"; and %p doesn't do that, but %p is really about processes, not threads. I suspect whatever is doing the merging of thread data is what's got the issue here. (Though I get the same effect when I tell the test runner to just use one thread, as well.)

scole66 · 2022-05-10T16:10:13Z

Just learned that the -j option of cargo test is not the same thing as -- --test-threads=1 (which sends that flag to the test-runner). Adding -- --test-threads=1 to my harness removes the intermittent results.

Not closing this issue, though. Rust is supposed to be "fearless concurrency" so this should work correctly even if the test runner is using multiple threads.

184: Limit the number of threads to work around rust-lang/rust#91092 r=taiki-e a=taiki-e Co-authored-by: Taiki Endo <te316e89@gmail.com>

184: Limit the number of test threads to work around rust-lang/rust#91092 r=taiki-e a=taiki-e Co-authored-by: Taiki Endo <te316e89@gmail.com>

According to: https://github.com/taiki-e/cargo-llvm-cov#known-limitations it's only defaulting to 1 thread, because of rustc issue: rust-lang/rust#91092 but it seems the issue is that relatively infrequently some tests will fail to be reported... which if fine with me if it makes the CI faster. And they are talking about thousands of tests, while we probably have <100.

Dushistov · 2023-04-20T19:50:46Z

Still reproducible with

rustc --version --verbose                                                                                                                                                                
rustc 1.70.0-beta.1 (1b7dd2252 2023-04-19)                                                                                                                                                 
binary: rustc                                                                                                                                                                              
commit-hash: 1b7dd2252b99671ce5d1cb9664c5f8636329436d                                                                                                                                      
commit-date: 2023-04-19                                                                                                                                                                    
host: x86_64-unknown-linux-gnu                                                                                                                                                             
release: 1.70.0-beta.1                                                                                                                                                                     
LLVM version: 16.0.2

I used such script to run against https://github.com/scole66/rust-e262/tree/reduction-for-bugreport ,
because of I can not find cargo profdata:

set -euo pipefail                                                                                                                                                                          
                                                                                                                                                                                           
rm -fr target                                                                                                                                                                              
mkdir -p target/coverage                                                                                                                                                                   
                                                                                                                                                                                           
export RUSTFLAGS="-Cinstrument-coverage"                                                                                                                                                   
cargo build                                                                                                                                                                                
export LLVM_PROFILE_FILE="target/coverage/%p-%m.profraw"                                                                                                                                   
                                                                                                                                                                                           
for i in $(seq 1 15000); do                                                                                                                                                                
    rm -f target/coverage/*.profraw target/coverage/cobertura.xml                                                                                                                          
    cargo test > /dev/null 2>&1                                                                                                                                                            
    grcov target/coverage --binary-path target/debug -s . -o target/coverage --keep-only 'src/*' --output-types cobertura                                                                  
    if [ ! -z "$(cat target/coverage/cobertura.xml | grep -E 'number="(8|9|10)' | grep -v 'hits="4"')" ]; then                                                                             
        echo "Found BUG at step $i"                                                                                                                                                        
        cat target/coverage/cobertura.xml                                                                                                                                                  
        exit 1                                                                                                                                                                             
    fi                                                                                                                                                                                     
done

at step 854, coverage <res::statething::State>::has: changed from 4 to 2.

scole66 · 2023-04-20T21:50:05Z

I can not find cargo profdata

This is in cargo-binutils.

Dushistov · 2023-05-04T21:34:22Z

I created bug against upstream llvm/llvm-project#62558

Dushistov · 2023-05-11T10:13:34Z

Looks like this is not llvm bug. There is bool flag InstrProfOptions::Atomic, that is false by default,
and depend on it llvm generates different code:

void InstrProfiling::lowerIncrement(InstrProfIncrementInst *Inc) {
  auto *Addr = getCounterAddress(Inc);

  IRBuilder<> Builder(Inc);
  if (Options.Atomic || AtomicCounterUpdateAll ||
      (Inc->getIndex()->isZeroValue() && AtomicFirstCounter)) {
    Builder.CreateAtomicRMW(AtomicRMWInst::Add, Addr, Inc->getStep(),
                            MaybeAlign(), AtomicOrdering::Monotonic);
  } else {
    Value *IncStep = Inc->getStep();
    Value *Load = Builder.CreateLoad(IncStep->getType(), Addr, "pgocount");
    auto *Count = Builder.CreateAdd(Load, Inc->getStep());
    auto *Store = Builder.CreateStore(Count, Addr);
    if (isCounterPromotionEnabled())
      PromotionCandidates.emplace_back(cast<Instruction>(Load), Store);
  }
  Inc->eraseFromParent();
}

So if set it to true via clang option -fprofile-update=atomic,
then C++ variant of test is unable to reproduce problem.

So is any way to set -fprofile-update=atomic via rustc command line?

taiki-e · 2023-05-11T14:48:57Z

-C llvm-args=--instrprof-atomic-counter-update-all or another atomic-counter flag may work, but I have not tested it yet.

$ rustc -C llvm-args='--help-list-hidden' | rg 'atomic-counter'
  --atomic-counter-update-promoted                                  - Do counter update using atomic fetch add  for promoted counters only
  --gcov-atomic-counter                                             - Make counter updates atomic
  --instrprof-atomic-counter-update-all                             - Make all profile counter updates atomic (for testing only)

Dushistov · 2023-05-11T15:10:45Z

-C llvm-args=--instrprof-atomic-counter-update-all or another atomic-counter flag may work, but I have not tested it yet.

I can not see how any of this options can reach InstrProfOptions here

rust/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp

Line 744 in f8d8ffa

InstrProfOptions Options;

.

So I create PR to set Atomic to true by default.

Dushistov · 2023-05-11T15:49:38Z

Looks like --instrprof-atomic-counter-update-all should also works.

void InstrProfiling::lowerIncrement(InstrProfIncrementInst *Inc) {
  auto *Addr = getCounterAddress(Inc);

  IRBuilder<> Builder(Inc);
  if (Options.Atomic || AtomicCounterUpdateAll ||

where AtomicCounterUpdateAll

cl::opt<bool> AtomicCounterUpdateAll(                                                                                                                                                      
    "instrprof-atomic-counter-update-all",                                                                                                                                                 
    cl::desc("Make all profile counter updates atomic (for testing only)"),                                                                                                                
    cl::init(false));

…r=wesleywiser Fix data race in llvm source code coverage Fixes rust-lang#91092 . Before this patch, increment of counters for code coverage looks like this: ``` movq .L__profc__RNvCsd6wgJFC5r19_3lib6bugaga+8(%rip), %rax addq $1, %rax movq %rax, .L__profc__RNvCsd6wgJFC5r19_3lib6bugaga+8(%rip) ``` after this patch: ``` lock incq .L__profc__RNvCs3JgIB2SjHh2_3lib6bugaga+8(%rip) ```

scole66 added the C-bug Category: This is a bug. label Nov 20, 2021

rustbot added the A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) label Nov 20, 2021

taiki-e mentioned this issue Jun 13, 2022

Build test cases in parallel dtolnay/trybuild#6

Closed

taiki-e added a commit to taiki-e/cargo-llvm-cov that referenced this issue Jun 13, 2022

Limit the number of threads to work around rust-lang/rust#91092

235a93a

bors bot added a commit to taiki-e/cargo-llvm-cov that referenced this issue Jun 13, 2022

Merge #184

7d2a2f6

184: Limit the number of threads to work around rust-lang/rust#91092 r=taiki-e a=taiki-e Co-authored-by: Taiki Endo <te316e89@gmail.com>

taiki-e added a commit to taiki-e/cargo-llvm-cov that referenced this issue Jun 13, 2022

Limit the number of test threads to work around rust-lang/rust#91092

2ae3ac5

taiki-e added a commit to taiki-e/cargo-llvm-cov that referenced this issue Jun 13, 2022

Limit the number of test threads to work around rust-lang/rust#91092

731a921

bors bot added a commit to taiki-e/cargo-llvm-cov that referenced this issue Jun 13, 2022

Merge #184

88eb00c

184: Limit the number of test threads to work around rust-lang/rust#91092 r=taiki-e a=taiki-e Co-authored-by: Taiki Endo <te316e89@gmail.com>

taiki-e mentioned this issue Aug 29, 2022

Difference between cargo test and cargo llvm-cov results when using thread_local taiki-e/cargo-llvm-cov#208

Closed

taiki-e mentioned this issue Nov 16, 2022

Code coverage dropped significantly after switching to nextest nextest-rs/nextest#652

Open

oojo12 mentioned this issue Nov 30, 2022

add coverage sarah-quinones/faer-rs#16

Merged

dpc mentioned this issue Apr 15, 2023

chore: make Code Coverage build run tests in parallel fedimint/fedimint#2234

Merged

def- mentioned this issue Apr 21, 2023

--test-threads override doesn't work taiki-e/cargo-llvm-cov#261

Closed

Dushistov mentioned this issue May 11, 2023

Fix data race in llvm source code coverage #111469

Merged

bors closed this as completed in 770fd73 May 13, 2023

taiki-e mentioned this issue May 13, 2023

Use llvm-args instead of RUST_TEST_THREADS, %Nm instead of NEXTEST_TEST_THREADS taiki-e/cargo-llvm-cov#279

Merged

cuviper mentioned this issue Jul 10, 2023

Sanity check profiler atomics #113448

Closed

briansmith mentioned this issue Jan 3, 2024

building for mipsel, link fails on missing symbol __sync_fetch_and_add_8, which doesn't exist on mips and shouldn't be getting called #112313

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent coverage failures (some test runs not counted) #91092

Intermittent coverage failures (some test runs not counted) #91092

scole66 commented Nov 20, 2021

scole66 commented Nov 20, 2021

Swatinem commented Dec 7, 2021

scole66 commented Dec 7, 2021

scole66 commented Dec 7, 2021

scole66 commented May 10, 2022

Dushistov commented Apr 20, 2023

scole66 commented Apr 20, 2023

Dushistov commented May 4, 2023

Dushistov commented May 11, 2023

taiki-e commented May 11, 2023

Dushistov commented May 11, 2023

Dushistov commented May 11, 2023

Intermittent coverage failures (some test runs not counted) #91092

Intermittent coverage failures (some test runs not counted) #91092

Comments

scole66 commented Nov 20, 2021

Meta

scole66 commented Nov 20, 2021

Swatinem commented Dec 7, 2021

scole66 commented Dec 7, 2021

scole66 commented Dec 7, 2021

scole66 commented May 10, 2022

Dushistov commented Apr 20, 2023

scole66 commented Apr 20, 2023

Dushistov commented May 4, 2023

Dushistov commented May 11, 2023

taiki-e commented May 11, 2023

Dushistov commented May 11, 2023

Dushistov commented May 11, 2023