Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for libtest JSON output #49359

Open
Gilnaa opened this issue Mar 25, 2018 · 63 comments
Open

Tracking issue for libtest JSON output #49359

Gilnaa opened this issue Mar 25, 2018 · 63 comments
Labels
A-libtest Area: `#[test]` / the `test` library B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-testing-devex Relevant to the testing devex team (testing DX), which will review and decide on the PR/issue.

Comments

@Gilnaa
Copy link

Gilnaa commented Mar 25, 2018

Added in #46450
Available in nightly behind a -Z flag.

@sfackler sfackler added B-unstable Blocker: Implemented in the nightly compiler and unstable. T-dev-tools Relevant to the dev-tools subteam, which will review and decide on the PR/issue. labels Mar 25, 2018
@XAMPPRocky XAMPPRocky added A-testsuite Area: The testsuite used to check the correctness of rustc C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Jun 29, 2018
@johnterickson
Copy link
Contributor

@nrc You mentioned "The long-term solution is going to be deeply intertwined with pluggable test runners" in #46450. Does that mean that JSON won't be stabilized until there are pluggable test runners?

I'm trying to figure out if it makes sense for me to continue looking at #51924 right now, or if I should hold off for the time being. For example, I'm looking at adding test durations to the JSON (master...johnterickson:testjson) and converting the output to JUnit (https://github.com/johnterickson/cargo2junit/).

@djrenren
Copy link
Contributor

Yeah I think we should stabilize something like this independently of custom test frameworks. It makes sense that the included test framework would have a programmatically readable output. Libtest's conceptual model is pretty simple and most likely won't undergo any drastic changes so I'm all for stabilizing something. I'd prefer to have a JSON Schema that describes the structure before stabilization. We'd also need to audit to ensure it's resilient to any minor changes or additions over time.

@alexcrichton thoughts?

@alexcrichton
Copy link
Member

Seems like a plausible addition to me! So long as it's designed careful I think it'd be good to go

@epage
Copy link
Contributor

epage commented Feb 27, 2019

From the experience of creating naiive serde structs / enums for libtest's output, here are my thoughts from translating it exactly how it is written in libtest (rather than inventing my own schema that happens to be compatible):

  • For suites / tests, the data is two enums nested. For some cases, this might work well, for others, it is unnecessary nesting to deal with.
  • Probably the most annoying aspect is that the type field conflates suite/test being finished and why it finished. This is annoying for if you want to get information regardless of the completion status/.
  • No bench start even is sent
  • My gut is telling me that the bench event is too narrowly focused on the output of the current implementation and not on what might be wanted

My PR where you can see the data structure's I created: https://github.com/crate-ci/escargot/pull/24/files

Ideas to consider

  • Combine event and type fields. Unsure of the value of this
  • Split type field into type and status and define more fields as being available for all types. For example, it could be useful to programmatically report stdout for a successful test and leave it to the rendering engine to decide whether to ignore it or not. This doesn't mean all test implementations need to report all of the fields, but define it as possible.
  • Get someone from other bench implementations, like criterion, to review the bench schema.

@andoriyu
Copy link
Contributor

andoriyu commented Mar 3, 2019

I think it's worth adding things like package and type of test. Type of test: unit test, doc test, integration test.

The package is something like: module for unit test, resource path for doc test (crate::foo::Bar), and filename for integration test.

@epage
Copy link
Contributor

epage commented Mar 4, 2019

Since libtest tracks test times (at minimum, reporting if its taking too long), it would be nice if that got included in the programmatic output so we can report it to the user, e.g. in JUnit. Alternatively, we'd have to timestamp the events as we read them.

@johnterickson
Copy link
Contributor

@epage / @andoriyu Are one of you planning on proposing a better schema? I like what you're proposing.

@rbtcollins
Copy link
Contributor

If I may add a semi-random thing; https://github.com/testing-cabal/subunit is a testing protocol thats been around for a while (one of a number) - the V2 even model is very similar in nature to your proposed JSON output schema. There is a rust implementation of the binary serialisation https://github.com/mtreinish/subunit-rust and I'm interested / planning on doing a JSON variant of V2 at some point for use with web stacks.

All of which to say two things I guess - consider thinking about chunking stdout/err, allowing additional streams, tags, and possibly, if its interesting, consider adopting the same model - I think it would be a super set of libtests current needs.

Of course, doing a new schema and having folk write adapters is complete fine too; I just saw this ticket and thought I'd offer this kibbitz :)

@Alxandr
Copy link

Alxandr commented Jul 26, 2019

I'd also love to see improvements in this. Also, currently there seems to be bug in the json output: johnterickson/cargo2junit#9

@janderholm
Copy link

I would also like duration output as part of the output. Furthermore when running multiple suites at the same time there's no way of knowing what test belong to what suite. It would be nice to name the suite somehow. Possibly connect it to the source file relative to the workspace root?

Here's a slimmed down example:

     Running target/debug/deps/rustrunner-e965cfdb4a6bd87d
{ "type": "suite", "event": "started", "test_count": 2 }
{ "type": "test", "event": "started", "name": "tests::leak" }
{ "type": "test", "event": "started", "name": "tests::test_true" }
{ "type": "test", "name": "tests::leak", "event": "ok" }
{ "type": "test", "name": "tests::test_true", "event": "ok" }
{ "type": "suite", "event": "failed", "passed": 2, "failed": 0, "allowed_fail": 0, "ignored": 0, "measured": 0, "filtered_out": 0 }
     Running target/debug/deps/subcrate-2da6bff97f5f00cf
{ "type": "suite", "event": "started", "test_count": 2 }
{ "type": "test", "event": "started", "name": "tests::it_works" }
{ "type": "test", "event": "started", "name": "tests::test_true" }
{ "type": "test", "name": "tests::it_works", "event": "ok" }
{ "type": "test", "name": "tests::test_true", "event": "ok" }
{ "type": "suite", "event": "ok", "passed": 2, "failed": 0, "allowed_fail": 0, "ignored": 0, "measured": 0, "filtered_out": 0 }
   Doc-tests subcrate
{ "type": "suite", "event": "started", "test_count": 0 }
{ "type": "suite", "event": "ok", "passed": 0, "failed": 0, "allowed_fail": 0, "ignored": 0, "measured": 0, "filtered_out": 0 }

Since both the parent crate and the subcrate have test modules called "test" and a test called "test_true there's no way of knowing which one failed.

@gilescope
Copy link
Contributor

Are people happy with cargo test -- -Zunstable-options --format json ? Should we be looking to stabilse this argument? (I notice rustc calls the same argument --message-format)

@Gilnaa
Copy link
Author

Gilnaa commented Sep 26, 2020

Is anyone actually using it?

I've made the PR 2.5 years ago, and at the time there was a talk about custom testing harnesses that would make this feature obsolete.

I did not see any feedback (neither positive nor negative), so I assumed it was just unused.

Anyway, one problem with the current format is that it doesn't output any timing information

@gilescope
Copy link
Contributor

gilescope commented Sep 26, 2020

I think the custom test harnesses have focused on alternative ways for test runners. I can't see much sign on the custom reporting side. To be honest standardised reporting for different custom runners would be what I would expect all CIs would like, so to me it feels like a json test reporting format seems front and center.

From my side I'm looking at translating these into teamcity service messages so that the web UI can be realtime driven as to how many tests are done and how long each one took.

I was thinking the duration can be calculated if need be based on when the started / stopped message happens, but having tried running two tests where one has a sleep in it, both seem to only report back in the json once they're both finished (even with --test-threads=2). This prevents retrospectively calculating the duration. I'm assuming the tests are running in parallel and it's the json reporting that's not streaming them back as they happen?

So timing info would be good, but even better would be getting the test finished events to output closer to when the test actually finishes. (Who doesn't like real time feedback and status bars showing how long before all the tests are likely to be finished?)


UPDATE
I think I had some very recursive tests (that happend to be calling cargo test themselves) - it works as expected if I ignore those tests.

@Gilnaa
Copy link
Author

Gilnaa commented Sep 26, 2020

I actually get the expected behavior, but I'll see if I can add timestamps trivially

@Gilnaa
Copy link
Author

Gilnaa commented Sep 26, 2020

Oh, there's already a flag that prints the elapsed time: --report-time

@gilescope
Copy link
Contributor

When I was asking for this kind of stuff in '17 it was becasue I had a dream that one day I might be a full time rust dev and be in teams using teamcity for development. At the time it was purely speculative that getting tis kind of integration ready would be helpful.

Fast forward a few years and sometimes dreams (with a bit of hard work) do come true and we have teams using rust and python at work. I'm trying to see if we can get the rust integration to be at parity with the python integration. It's a good problem to have!

@gilescope
Copy link
Contributor

Ok, that's all working perfectly for me -- many thanks!
Time for me to write that converter to TeamCity service messages...

@epage
Copy link
Contributor

epage commented Sep 26, 2020

Are people happy with cargo test -- -Zunstable-options --format json ? Should we be looking to stabilse this argument? (I notice rustc calls the same argument --message-format)

No, the json output is not great for working with in my experience. See my post earlier in this thread from me maintaining a Rust API over tests.

#49359 (comment)

You can find the crate for this over at https://crates.io/crates/escargot

@andoriyu
Copy link
Contributor

Are people happy with cargo test -- -Zunstable-options --format json ? Should we be looking to stabilse this argument? (I notice rustc calls the same argument --message-format)

I'm using it in my cargo-suity that I use to generate junit reports. Obviously, I'd rather eliminate my shim and just have cargo generate jUnit output, but json output is "okay, I guess" alternative. That said, current output is not easy to consume for my needs.

@Gilnaa
Copy link
Author

Gilnaa commented Sep 27, 2020

I actually opened a PR some time ago implementing a basic JUnit output. I just did that against the wrong repo.

rust-lang/libtest#16

@nrc
Copy link
Member

nrc commented Apr 26, 2023

Could we stabilise the existing json output as --format json-0.1 or something? There doesn't seem to be much movement towards a good version of json output and in the meantime this bad json output is useful for users, but can't be used by unstable code. This gets more urgent with #109044 which closes a loophole making the JSON output available from stable libtest, i.e., people currently using the json output are going to get a de facto breaking change with the next Rust release.

@ehuss
Copy link
Contributor

ehuss commented Apr 26, 2023

I think a good first step would be if someone could document the current format in the unstable book. AFAIK it has never been written down.

Another helpful thing would be if someone could summarize what concerns there are. Why do you consider the current format "bad"?

Even further, I think it would be helpful for someone to propose an RFC to stabilize it (though probably should talk with the libs team to see how they think it would be helpful to move forward). The RFC should cover what alternatives were considered and why this is being chosen.

@epage
Copy link
Contributor

epage commented Apr 26, 2023

In addition to what ehuss said, I think a good test bed for json output will be rust-lang/cargo#5609.

While its not my highest priority (finishing lints rfc, cargo-script rfc, and msrv are), I am also starting to put some attention to the problems of testing within Rust. Overall, I see this effort as (1) applying the lessons from all the different custom test harness and (2) finding the appropriate places to stablize cargo-nextest behavior. I am first experimenting with an alternative design to libtest itself that I'm hoping will offer the benefits we had hoped for from rust-lang/rfcs#2318. My hope is that both the libtest and cargo-test sides of this work (including rust-lang/cargo#5609) will lead to a solid json format.

@nrc
Copy link
Member

nrc commented May 2, 2023

Why do you consider the current format "bad"?

I'm not sure I do, but the overall feeling I get from this thread is that it is at least not optimal

@pietroalbini
Copy link
Member

So, I have some recent experience using this in #108659. The goal of that PR was to extract stats about the individual tests executed by bootstrap (name of the test, outcome, eventual ignore reason), and there were a few issues I noticed with the current way the JSON output works.

The minor issue is just some unclarity in the JSON format, that could likely be shuffled a bit to be more coherent (for example there is both a reason and a message when a test is ignored).

The major issue I found is that a tool wanting to extract the JSON information for further processing and wanting to show the familiar CLI UI to users needs to reimplement the whole rendering code, as there is no rendered/rendered-ansi option like rustc. This resulted in me having to write a full renderer in bootstrap just to know which tests were executed. The design of the flag will likely not be trivial, as a rendered field would also need to distinguish between the "terse" (used by bootstrap) and "verbose" (used by default by cargo) outputs.

@wyatt-herkamp
Copy link
Contributor

wyatt-herkamp commented Nov 19, 2023

Why do you consider the current format "bad"?

I'm not sure I do, but the overall feeling I get from this thread is that it is at least not optimal

One flaw I see inside the current implementation is Doc Tests
cargo +nightly test -- -Z unstable-options --list --format json
For Doc Tests it will print out
  Doc-tests heck
Then print out the doc tests in JSON format.
This string is weird if we are trying to provide a machine readable format.

Then I noticed that start line and end lines are not provided in Doc Tests.

So start lines should be in the actual JSON. And the Doc-tests message should remove and maybe a test type attribute should be added to the format. Or when the event "discovery" is printed it should have an attribute indicating the start of Doc Tests.

@Noratrieb
Copy link
Member

It would be easier to evolve the format it was versioned. This way we could also ship a suboptimal version earlier and always upgrade it with breaking changes to a more optimal version later (while keeping the old version working. It does of course still require a design for the first stable version that we can be sure won't break in the future, but doesn't require a design that's perfect and fits all use cases).

@epage
Copy link
Contributor

epage commented Jan 9, 2024

FYI the testing devex team was recently formed and we've decided that the first order of business is working to stabilize the json output. We will be writing up our plan soon for how we'll go about that effort.

@epage
Copy link
Contributor

epage commented Jan 9, 2024

I've posted a draft of my plan at https://internals.rust-lang.org/t/path-for-stabilizing-libtests-json-output/20163

@epage
Copy link
Contributor

epage commented Jan 19, 2024

I've posted an eRFC for the stabilization plan: rust-lang/rfcs#3558

@gilescope
Copy link
Contributor

For CI we use the json for coverage, but humans like to look at the pretty output so we run two jobs one running the tests with the default format, one with the json format. It would be great to be able to have the json format being saved to a specified file while at the same time std out produces nice human format.

@epage
Copy link
Contributor

epage commented May 16, 2024

In my plan for things, the responsibility for that would be moved up into cargo test. Cargo will exclusively accept json output from libtest and then do all of the rendering. This means users will need to ask Cargo for the json output as well.

@jieyouxu jieyouxu added the T-testing-devex Relevant to the testing devex team (testing DX), which will review and decide on the PR/issue. label Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-libtest Area: `#[test]` / the `test` library B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-testing-devex Relevant to the testing devex team (testing DX), which will review and decide on the PR/issue.
Projects
Status: No status
Development

No branches or pull requests