Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASSERT type_is_instr in multiple drcachesim online tests due to weird pipe ordering issue #3320

Open
derekbruening opened this issue Dec 21, 2018 · 50 comments · Fixed by #3859

Comments

@derekbruening
Copy link
Contributor

The 32-bit version failed once:
https://ci.appveyor.com/project/DynamoRIO/dynamorio/builds/21162519

[00:09:36] test 213
[00:09:36]         Start 213: code_api|tool.reuse_distance
[00:09:36] 
[00:09:36] 213: Test command: C:\projects\dynamorio\build\build_debug-internal-32\bin32\drrun.exe "-s" "90" "-quiet" "-debug" "-use_dll" "C:/projects/dynamorio/build/build_debug-internal-32/lib32/debug/dynamorio.dll" "-exit0" "-stderr_mask" "0xC" "-msgbox_mask" "0" "-dumpcore_mask" "0x7d" "-staged" "-code_api" "-t" "drcachesim" "-ipc_name" "drtestpipe_reuse_distance" "-simulator_type" "reuse_distance" "-reuse_distance_threshold" "256" "--" "C:/projects/dynamorio/build/build_debug-internal-32/suite/tests/bin/simple_app.exe"
[00:09:36] 213: Test timeout computed to be: 600
[00:09:36] 213: Hello, world!
[00:09:36] 213: Assertion failed: type_is_instr(cur_ref.instr.type), file ..\..\clients\drcachesim\reader\reader.cpp, line 132
@derekbruening
Copy link
Contributor Author

Happened again:
https://ci.appveyor.com/project/DynamoRIO/dynamorio/builds/22313260

00:06:34] test 213
[00:06:34]         Start 213: code_api|tool.reuse_distance
[00:06:34] 
[00:06:34] 213: Test command: C:\projects\dynamorio\build\build_debug-internal-32\bin32\drrun.exe "-s" "90" "-quiet" "-debug" "-use_dll" "C:/projects/dynamorio/build/build_debug-internal-32/lib32/debug/dynamorio.dll" "-exit0" "-stderr_mask" "0xC" "-msgbox_mask" "0" "-dumpcore_mask" "0x7d" "-staged" "-code_api" "-t" "drcachesim" "-ipc_name" "drtestpipe_reuse_distance" "-simulator_type" "reuse_distance" "-reuse_distance_threshold" "256" "--" "C:/projects/dynamorio/build/build_debug-internal-32/suite/tests/bin/simple_app.exe"
[00:06:34] 213: Test timeout computed to be: 600
[00:06:34] 213: Hello, world!
[00:06:34] 213: Assertion failed: type_is_instr(cur_ref.instr.type), file ..\..\clients\drcachesim\reader\reader.cpp, line 132

@derekbruening
Copy link
Contributor Author

@derekbruening
Copy link
Contributor Author

This same assert just happened in code_api|tool.drcachesim.warmup-zeros
https://ci.appveyor.com/project/DynamoRIO/dynamorio/builds/23205784

00:20:18] Hello, world!
[00:20:18] Assertion failed: type_is_instr(cur_ref.instr.type), file ..\..\clients\drcachesim\reader\reader.cpp, line 132

@derekbruening
Copy link
Contributor Author

warmup-zeros failed again with that assert: https://ci.appveyor.com/project/DynamoRIO/dynamorio/builds/23571201

@derekbruening derekbruening changed the title tool.reuse_distance test failed once on appveyor ASSERT type_is_instr in multiple drcachesim tests (tool.reuse_distance,tool.drcachesim.warmup-zeros) on appveyor Apr 3, 2019
@derekbruening derekbruening changed the title ASSERT type_is_instr in multiple drcachesim tests (tool.reuse_distance,tool.drcachesim.warmup-zeros) on appveyor ASSERT type_is_instr in multiple drcachesim tests (tool.reuse_distance,tool.drcachesim.warmup-zeros,tool.drcachesim.missfile) on appveyor Apr 6, 2019
@derekbruening
Copy link
Contributor Author

Happened on the missfile test:
https://ci.appveyor.com/project/DynamoRIO/dynamorio/builds/23634836

@derekbruening
Copy link
Contributor Author

@derekbruening
Copy link
Contributor Author

@derekbruening
Copy link
Contributor Author

@hgreving2304
Copy link

@derekbruening
Copy link
Contributor Author

@derekbruening
Copy link
Contributor Author

Happened in tool.drcachesim.coherence with a timeout being reported: #3803.

@derekbruening
Copy link
Contributor Author

derekbruening commented Oct 3, 2024

Happened on riscv reuse-distance: Invalid trace entry type thread_exit (23) before a bundle https://github.com/DynamoRIO/dynamorio/actions/runs/11171065544/job/31055108423?pr=7018

@xdje42
Copy link
Contributor

xdje42 commented Oct 4, 2024

Data point: Another failure, in master:

====> FAILURE in release-external-64 <====
release-external-64: 116 tests passed, **** 1 tests failed, but ignoring 0 for i#5365: ****
	code_api|tool.drcachesim.TLB-threads

https://github.com/DynamoRIO/dynamorio/actions/runs/11185116691/job/31097308404

@egrimley-arm
Copy link
Contributor

Data arriving out of order over the pipe only when the kernel is under stress?!?

I don't suppose anyone has observed this phenomenon in the absence of DynamoRIO?

I wrote a 60-line program that reads or writes blocks of data down a pipe and checks for anything arriving out of order. I used 500-byte blocks and had 8 writers and one reader, and I ran DynamoRIO's tests with -j 8 on the same machine at the same time. My simple test program didn't report any problems. The real test programs are doing other system calls besides read and write, of course, which might be significant.

@joshua-warburton
Copy link
Collaborator

The problem only appears when the tests are run as part of the test suite- oddly enough. I ran it hundreds of times invoking the tests directly and never hit the problem.

egrimley-arm added a commit that referenced this issue Jan 27, 2025
It frequently fails in the same way as other drcachesim tests
on SVE hardware.

Issue: #3320

Change-Id: I2383af3ca8af584f769ebd8e68fc9a0a82928ed1
@derekbruening
Copy link
Contributor Author

This failure has made the aarch64-* suites red nearly half the time for many months now, yet we don't think the failure itself is specific to aarch64 or a particular kernel version. It turns out this is all due to the #2204 feature of retrying a failing test in the suite and only marking it red if it fails 3x in a row not being enabled on the aarch64-* suites as they do not have the CMake 3.17+ version required. See #7222 (comment).

@derekbruening derekbruening changed the title ASSERT type_is_instr in multiple drcachesim tests (tool.reuse_distance,tool.drcachesim.warmup-zeros,tool.drcachesim.missfile) on appveyor ASSERT type_is_instr in multiple drcachesim online tests due to weird pipe ordering issue Jan 27, 2025
@egrimley-arm
Copy link
Contributor

With 6549e88 I ran ctest -j 8 -I 239,243 -V 100 times on a particular machine (500 tests in total) and saw 55 failures with 55 instances of

reader.cpp:262: virtual bool dynamorio::drmemtrace::reader_t::process_input_entry(): Assertion `type_is_instr(cur_ref_.instr.type) || cur_ref_.instr.type == TRACE_TYPE_INSTR_NO_FETCH' failed.

I disabled the assert and tried again, and this time I instead got 51 failures with 51 instances of Invalid trace entry type thread_exit (23) before a bundle.

So disabling the assert wouldn't help. But there are other moves afoot; see above.

@derekbruening
Copy link
Contributor Author

I disabled the assert and tried again, and this time I instead got 51 failures with 51 instances of Invalid trace entry type thread_exit (23) before a bundle.

You can see that #4167, which is the other visible message Invalid trace entry type..., has already been duplicated to this issue a while ago as they are the same underlying problem.

@derekbruening
Copy link
Contributor Author

If we can find a way to clean up the stale pipe, the retry-3x feature of #2204 should help avoid these sporadic failures turning suites red. As noted at #2204 (comment), typically we have one failure followed by the 2x retries timing out due to the stale pipe file.

derekbruening added a commit that referenced this issue Feb 7, 2025
Adds removal of the drcachesim pipe before and after running each
test, to avoid a stale pipe file causing a retry-on-failure to time
out.

Tested by adding "-unsafe_crash_process" to the options and then:
```
$ ctest -V -R coherence
<...>
325:   <Application
325:   /usr/local/google/home/bruening/dr/git/build_x64_dbg_tests/suite/tests/bin/client.annotation-concurrency
325:   (465432).  DynamoRIO Cache Simulator Tracer internal crash at PC
325:   0x00007f46cc0614b4.  Please report this at http://dynamorio.org/issues.
325:   Program aborted.
<...>
$ ls suite/tests/drtestpipe*
ls: cannot access 'suite/tests/drtestpipe*': No such file or directory
```

Issue: #3320, #2204
derekbruening added a commit that referenced this issue Feb 8, 2025
Adds removal of the drcachesim pipe before and after running each test,
to avoid a stale pipe file causing a retry-on-failure to time out. This
should help prevent #3320 from turning the suite red from what should be
a single failure passing on a retry.

Modifies the asm tests to print to stderr instead of stdout for proper
ordering within runcmp.cmake (as well as to match the conventions of all
other tests).

Tested by adding "-unsafe_crash_process" to the options and then:
```
$ ctest -V -R coherence
<...>
325:   <Application
325:   /usr/local/google/home/bruening/dr/git/build_x64_dbg_tests/suite/tests/bin/client.annotation-concurrency
325:   (465432).  DynamoRIO Cache Simulator Tracer internal crash at PC
325:   0x00007f46cc0614b4.  Please report this at http://dynamorio.org/issues.
325:   Program aborted.
<...>
$ ls suite/tests/drtestpipe*
ls: cannot access 'suite/tests/drtestpipe*': No such file or directory
```

Issue: #3320, #2204
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants