Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

want core files from CI test failures #34

Open
davepacheco opened this issue Aug 7, 2020 · 8 comments
Open

want core files from CI test failures #34

davepacheco opened this issue Aug 7, 2020 · 8 comments

Comments

@davepacheco
Copy link
Collaborator

It would be really helpful if when tests failed as a result of a panic (e.g., assertion failure), we had a core file from the process.

On the Rust side, you didn't use to be able to set panic = "abort" in tests (well, you could, but it wouldn't do anything). But it seems that there's an experimental flag "panic-abort-tests". I have not played with this enough to see how it works, or if it works.

On the GitHub CI side, it's possible to upload artifacts including core files:
https://docs.github.com/en/actions/configuring-and-managing-workflows/persisting-workflow-data-using-artifacts

@davepacheco
Copy link
Collaborator Author

I've been trying to use panic = "abort" and -Z panic-abort-tests, but I think the latter doesn't do what I think it does.

I've applied this patch:

diff --git a/Cargo.toml b/Cargo.toml
index 5ebcba0..d954b78 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -6,3 +6,12 @@
 [workspace]
 members = ["dropshot", "dropshot_endpoint" ]
 default-members = ["dropshot", "dropshot_endpoint" ]
+
+[profile.dev]
+panic = "abort"
+
+[profile.release]
+panic = "abort"
+
+[profile.test]
+panic = "abort"
diff --git a/dropshot/tests/test_demo.rs b/dropshot/tests/test_demo.rs
index 7c4817f..12fd315 100644
--- a/dropshot/tests/test_demo.rs
+++ b/dropshot/tests/test_demo.rs
@@ -465,6 +465,7 @@ async fn test_demo_path_param_uuid() {
     let json: DemoPathUuid = read_json(&mut response).await;
     assert_eq!(json.test1.to_string(), uuid_str);
 
+    panic!("boom!");
     testctx.teardown().await;
 }

Here's where I used it:

dap@zathras dropshot $ RUST_BACKTRACE=full cargo +nightly test -v --test=test_demo -Z panic-abort-tests
       Fresh unicode-xid v0.2.0
       Fresh autocfg v1.0.0
       Fresh lazy_static v1.4.0
       Fresh slab v0.4.2
       Fresh futures-core v0.3.5
       Fresh cfg-if v0.1.10
       Fresh futures-sink v0.3.5
       Fresh fnv v1.0.7
       Fresh bytes v0.5.5
       Fresh proc-macro-hack v0.5.16
       Fresh once_cell v1.4.0
       Fresh arc-swap v0.4.7
       Fresh futures-io v0.3.5
       Fresh pin-utils v0.1.0
       Fresh matches v0.1.8
       Fresh ppv-lite86 v0.2.8
       Fresh pin-project-lite v0.1.7
       Fresh tinyvec v0.3.3
       Fresh linked-hash-map v0.5.3
       Fresh dtoa v0.4.6
       Fresh itoa v0.4.6
       Fresh try-lock v0.2.2
       Fresh percent-encoding v2.1.0
       Fresh tower-service v0.3.0
       Fresh match_cfg v0.1.0
       Fresh take_mut v0.2.2
       Fresh newline-converter v0.1.0
       Fresh base64 v0.12.3
       Fresh difference v2.0.0
       Fresh thread_local v1.0.1
       Fresh futures-channel v0.3.5
       Fresh http v0.2.1
       Fresh futures-task v0.3.5
       Fresh unicode-bidi v0.3.4
       Fresh unicode-normalization v0.1.13
       Fresh yaml-rust v0.4.4
       Fresh libc v0.2.71
       Fresh expectorate v1.0.1
       Fresh proc-macro2 v1.0.18
       Fresh ryu v1.0.5
       Fresh memchr v2.3.3
       Fresh log v0.4.8
       Fresh proc-macro-nested v0.1.6
       Fresh http-body v0.3.1
       Fresh slog v2.5.2
       Fresh idna v0.2.0
       Fresh maybe-uninit v2.0.0
       Fresh httparse v1.3.4
       Fresh net2 v0.2.34
       Fresh iovec v0.1.4
       Fresh getrandom v0.1.14
       Fresh signal-hook-registry v1.2.0
       Fresh num_cpus v1.13.0
       Fresh time v0.1.43
       Fresh dirs-sys v0.3.5
       Fresh hostname v0.1.5
       Fresh socket2 v0.3.12
       Fresh atty v0.2.14
       Fresh hostname v0.3.1
       Fresh subprocess v0.2.4
       Fresh quote v1.0.7
       Fresh num-traits v0.2.12
       Fresh crossbeam-utils v0.7.2
       Fresh want v0.3.0
       Fresh url v2.1.1
       Fresh mio v0.6.22
       Fresh rand_core v0.5.1
       Fresh dirs v2.0.2
       Fresh syn v1.0.31
       Fresh num-integer v0.1.43
       Fresh crossbeam-channel v0.4.2
       Fresh mio-uds v0.6.8
       Fresh rand_chacha v0.2.2
       Fresh serde_derive v1.0.112
       Fresh pin-project-internal v0.4.22
       Fresh futures-macro v0.3.5
       Fresh serde_derive_internals v0.25.0
       Fresh tokio-macros v0.2.5
       Fresh num-iter v0.1.41
       Fresh term v0.6.1
       Fresh async-trait v0.1.36
       Fresh serde_with_macros v1.1.0
       Fresh slog-async v2.5.0
       Fresh serde v1.0.112
       Fresh rand v0.7.3
       Fresh pin-project v0.4.22
       Fresh tokio v0.2.21
       Fresh schemars_derive v0.7.6
       Fresh num v0.1.42
       Fresh indexmap v1.4.0
       Fresh serde_yaml v0.8.13
       Fresh serde_urlencoded v0.6.1
       Fresh toml v0.5.6
       Fresh serde_with v1.4.0
       Fresh serde_json v1.0.55
       Fresh chrono v0.4.11
       Fresh serde_tokenstream v0.1.0
       Fresh uuid v0.8.1
       Fresh futures-util v0.3.5
       Fresh tokio-util v0.3.1
       Fresh chrono v0.3.0
       Fresh slog-json v2.3.0
       Fresh slog-term v2.6.0
       Fresh openapiv3 v0.3.2
       Fresh schemars v0.7.6
       Fresh h2 v0.2.5
       Fresh futures-executor v0.3.5
       Fresh slog-bunyan v2.2.0
       Fresh dropshot_endpoint v0.2.0 (/Users/dap/oxide/dropshot/dropshot_endpoint)
       Fresh hyper v0.13.6
       Fresh futures v0.3.5
   Compiling dropshot v0.3.0 (/Users/dap/oxide/dropshot/dropshot)
     Running `rustc --crate-name test_demo --edition=2018 dropshot/tests/test_demo.rs --error-format=json --json=diagnostic-rendered-ansi --emit=dep-info,link -C panic=abort -Cembed-bitcode=no -C debuginfo=2 --test -Zpanic-abort-tests -C metadata=95ee515f3e63a149 -C extra-filename=-95ee515f3e63a149 --out-dir /Users/dap/oxide/dropshot/target/debug/deps -C incremental=/Users/dap/oxide/dropshot/target/debug/incremental -L dependency=/Users/dap/oxide/dropshot/target/debug/deps --extern async_trait=/Users/dap/oxide/dropshot/target/debug/deps/libasync_trait-5a68cbb0c93014e7.dylib --extern base64=/Users/dap/oxide/dropshot/target/debug/deps/libbase64-5867cd091997d27b.rlib --extern bytes=/Users/dap/oxide/dropshot/target/debug/deps/libbytes-9f611d0a3c77f8e7.rlib --extern chrono=/Users/dap/oxide/dropshot/target/debug/deps/libchrono-d880b00470d83fb5.rlib --extern dropshot=/Users/dap/oxide/dropshot/target/debug/deps/libdropshot-091a801dc32e238a.rlib --extern dropshot_endpoint=/Users/dap/oxide/dropshot/target/debug/deps/libdropshot_endpoint-34b8d1c5bc27117c.dylib --extern expectorate=/Users/dap/oxide/dropshot/target/debug/deps/libexpectorate-af39ef58a407c25c.rlib --extern futures=/Users/dap/oxide/dropshot/target/debug/deps/libfutures-d1085323d0d9d162.rlib --extern hostname=/Users/dap/oxide/dropshot/target/debug/deps/libhostname-c7fc6b5aca3dabef.rlib --extern http=/Users/dap/oxide/dropshot/target/debug/deps/libhttp-400d45dec38085ff.rlib --extern hyper=/Users/dap/oxide/dropshot/target/debug/deps/libhyper-09102f66b19463e2.rlib --extern indexmap=/Users/dap/oxide/dropshot/target/debug/deps/libindexmap-2187fec18b9debcc.rlib --extern lazy_static=/Users/dap/oxide/dropshot/target/debug/deps/liblazy_static-514402c90ba6436f.rlib --extern libc=/Users/dap/oxide/dropshot/target/debug/deps/liblibc-cbfadbbc5730dcd7.rlib --extern openapiv3=/Users/dap/oxide/dropshot/target/debug/deps/libopenapiv3-85c8b71a6c781296.rlib --extern schemars=/Users/dap/oxide/dropshot/target/debug/deps/libschemars-1804836149c29d47.rlib --extern serde=/Users/dap/oxide/dropshot/target/debug/deps/libserde-4c4498dda766ad8a.rlib --extern serde_json=/Users/dap/oxide/dropshot/target/debug/deps/libserde_json-f06b1e57558cc1de.rlib --extern serde_urlencoded=/Users/dap/oxide/dropshot/target/debug/deps/libserde_urlencoded-43c511db2b5c527d.rlib --extern serde_with=/Users/dap/oxide/dropshot/target/debug/deps/libserde_with-8d54dfae5a55c452.rlib --extern slog=/Users/dap/oxide/dropshot/target/debug/deps/libslog-aa3e331e8b356dac.rlib --extern slog_async=/Users/dap/oxide/dropshot/target/debug/deps/libslog_async-84d7b9e39cbfefc5.rlib --extern slog_bunyan=/Users/dap/oxide/dropshot/target/debug/deps/libslog_bunyan-4edc97946439dccb.rlib --extern slog_json=/Users/dap/oxide/dropshot/target/debug/deps/libslog_json-82d1634edb84c365.rlib --extern slog_term=/Users/dap/oxide/dropshot/target/debug/deps/libslog_term-15a2981c2842cc6d.rlib --extern subprocess=/Users/dap/oxide/dropshot/target/debug/deps/libsubprocess-d23ea2a0b7752b42.rlib --extern tokio=/Users/dap/oxide/dropshot/target/debug/deps/libtokio-42ccc0cdcb53da67.rlib --extern toml=/Users/dap/oxide/dropshot/target/debug/deps/libtoml-d0357731676c3701.rlib --extern uuid=/Users/dap/oxide/dropshot/target/debug/deps/libuuid-a818ec77854019d9.rlib`
warning: unreachable statement
   --> dropshot/tests/test_demo.rs:469:5
    |
468 |     panic!("boom!");
    |     ---------------- any code following this expression is unreachable
469 |     testctx.teardown().await;
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^ unreachable statement
    |
    = note: `#[warn(unreachable_code)]` on by default

warning: 1 warning emitted

    Finished test [unoptimized + debuginfo] target(s) in 8.05s
     Running `/Users/dap/oxide/dropshot/target/debug/deps/test_demo-95ee515f3e63a149`

running 6 tests
test test_demo1 ... ok
test test_demo2query ... ok
test test_demo3json ... ok
test test_demo_path_param_string ... ok
test test_demo2json ... ok
test test_demo_path_param_uuid ... FAILED

failures:

---- test_demo_path_param_uuid stdout ----
---- test_demo_path_param_uuid stderr ----
log file: "/var/folders/x3/6tbrhnss6zz677z4zjsyvkqc0000gn/T/test_demo-95ee515f3e63a149-demo_path_param_uuid.24248.0.log"
note: configured to log to "/var/folders/x3/6tbrhnss6zz677z4zjsyvkqc0000gn/T/test_demo-95ee515f3e63a149-demo_path_param_uuid.24248.0.log"
thread 'main' panicked at 'boom!', dropshot/tests/test_demo.rs:468:5
stack backtrace:
   0:        0x1009dd44e - backtrace::backtrace::libunwind::trace::h83d380479a349ab3
                               at /Users/runner/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
   1:        0x1009dd44e - backtrace::backtrace::trace_unsynchronized::h4a3f7d20acb505b9
                               at /Users/runner/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
   2:        0x1009dd44e - std::sys_common::backtrace::_print_fmt::hfc2e8ed9d735c796
                               at src/libstd/sys_common/backtrace.rs:78
   3:        0x1009dd44e - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hcfcbc219168eb9b7
                               at src/libstd/sys_common/backtrace.rs:59
   4:        0x100a0412c - core::fmt::write::h26bd57c1fd6e9c98
                               at src/libcore/fmt/mod.rs:1076
   5:        0x1009d6037 - std::io::Write::write_fmt::h845f537fbf978cc8
                               at src/libstd/io/mod.rs:1537
   6:        0x1009df735 - std::sys_common::backtrace::_print::hced840f80fcbf47d
                               at src/libstd/sys_common/backtrace.rs:62
   7:        0x1009df735 - std::sys_common::backtrace::print::he28990a439d37dd9
                               at src/libstd/sys_common/backtrace.rs:49
   8:        0x1009df735 - std::panicking::default_hook::{{closure}}::h3957e11a55ceea22
                               at src/libstd/panicking.rs:198
   9:        0x1009df472 - std::panicking::default_hook::h507e984f11c0212c
                               at src/libstd/panicking.rs:218
  10:        0x1005b00bf - <alloc::boxed::Box<F> as core::ops::function::Fn<A>>::call::hc706306034a586eb
                               at /rustc/ad4bc3323b9299d867697e9653dcea1b5e1ad283/src/liballoc/boxed.rs:1090
  11:        0x1005b00bf - test::run_test_in_spawned_subprocess::{{closure}}::h7d60623cbd8e588e
                               at src/libtest/lib.rs:630
  12:        0x1005b0110 - test::run_test_in_spawned_subprocess::{{closure}}::h92071491bdf7a6f2
                               at src/libtest/lib.rs:640
  13:        0x1009dfd15 - std::panicking::rust_panic_with_hook::ha8ebb2ce42738316
                               at src/libstd/panicking.rs:490
  14:        0x100a104aa - std::panicking::begin_panic::hf6df53e8a3f4bc10


failures:
    test_demo_path_param_uuid

test result: FAILED. 5 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

error: test failed, to rerun pass '-p dropshot --test test_demo'

From this, we can see that it did compile the integration test test_demo with -C panic=abort. But it doesn't look like it aborted, and it definitely didn't generate a core file. With lldb, I've been able to tell that rust_panic points to the abort implementation in this binary. I wanted to trace what was happening at the point of panic, but I'm running afoul of the fact that this mode uses subprocesses and lldb doesn't support following child processes.

@bnaecker
Copy link
Contributor

@davepacheco I've found a way to do this, but it aborts the process right after the failed test. I know that sounds like an obvious point, but it means that we'd only get information about the first test that fails. If that's a reasonable solution, then I can put up a PR for this.

@davepacheco
Copy link
Collaborator Author

Thanks for looking into this. Personally I do prefer this tradeoff because if there are any failures, whether on the main branch or locally in development, you pretty much want to fix them -- the core file is more useful to me in that context than knowing how many other tests failed. I think this is probably not a popular opinion but I'd like to try it.

@bnaecker
Copy link
Contributor

Failing fast is a reasonable tradeoff, I agree. Is the goal here to actually generate the core file? Or is it to debug the process in general, either when it fails or post-mortem? In CI, the core file seems like a great idea, since post-mortem is the only option. But if you're developing locally, there's a way to attach the debugger at the right point without going through the core file.

Point LLDB at the actual executable, for example: dropshot/target/debug/deps/test_demo-397848206bcb766a, and then set a breakpoint on begin_panic. rust_panic also works, it's just a few more hops down the stack. In LLDB that's break set -n begin_panic.

I'm happy to put up a PR that implements panic-and-dump approach, but I want to make sure it'll solve the right problem.

@davepacheco
Copy link
Collaborator Author

Ideally, it would abort() at the point of the panic. This way, if you haven't attached a debugger and have configured core dumps, you'll get one. However, if you'd rather attach the debugger ahead of time, you can do that too. (I believe debuggers generally break on abort, but I haven't recently tested this with lldb or gdb.)

@davepacheco
Copy link
Collaborator Author

Last time I looked at this, I wrote a detailed summary of panics and core files in Rust that's available internal to Oxide.

@bnaecker
Copy link
Contributor

After some experimentation, it seems like there's not a great solution here. I've included some relevant notes here, but there's not an obvious path for now.

Ideally, we'd like tests that fail to call abort(3), dump their core, and terminate the calling process. One issue is that there are tests where "failing" means a failure to panic, via the #[should_panic] attribute. We don't want all panics to abort, only those that are unexpected.

One option we've explored is using the coredump crate, which registers a new panic hook that sends SIGQUIT to the process on a panic. This works for the cases of unexpected panics, which is good, but it also kills the process on an expected panic, which is less good.

In theory, the option -Z panic-abort-tests option should handle this. The documentation says:

The -Z panic-abort-tests flag will enable nightly support to compile test harness crates with -Cpanic=abort.

which should modify the panic! behavior to abort rather than unwind the stack. However, as @davepacheco previously found out, that doesn't work. The issue is that the relevant code seems to call exit, which, combined with the fact that tests are run in a subprocess rather than the usual thread, means the abort behavior is never really observed.

There are other options, such as writing our own panic! or assert! and friends, which directly call std::process::abort. This is not great, since it would both be surprising to developers and require us to maintain code to support what is supposed to be a language feature.

Until a better way forward presents itself, this issue is probably on the back burner.

@davepacheco
Copy link
Collaborator Author

davepacheco commented Feb 4, 2021

Writing this down for my own future self: from my past notes and this testing, I think the closest thing to what I'd want given the pieces available today is to use the coredump crate with the default unwind strategy. This causes core files to be generated on panic, both in the application under test and the test suite, but it does the wrong thing for should_panic tests, of which we have a bunch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants