reportVersion semantics are not defined #349

addaleax · 2020-01-17T15:24:47Z

The process.report/--experimental-report feature comes with output that contains a version number; however, it is unclear what that version number means and when it is incremented.

For nodejs/node#31386, it would be a good starting point to know if purely additive changes to the JSON format (i.e. only adding previously non-existent keys) should lead to version bumps.

@jasnell also suggested switching to semver, whereas @cjihrig pointed out that in the PR that added versioning, a single-integer versioning scheme was requested. I personally find it hard to make a decision on this question without knowing how consumers are supposed to interact with the version number.

I'm adding this to the WG agenda, I hope that's okay.

The text was updated successfully, but these errors were encountered:

richardlau · 2020-01-17T15:28:47Z

cc @boneskull

mhdawson · 2020-01-17T17:25:12Z

For me I think it depends on whether it's ok to ask consumers to check the Node.js version as well. If a breaking change will be signalled by a bump in the Node.js major, then the version number as is might be ok with it being bumped every time something new is added to the report.

Having said that it may be better to use SemVer and separate out from the Node.js version. For example, a new Node.js major may have no breaking changes in the report so using the Node.js version for when you may need a new version of tools processing reports is sub-optimal.

The previous comparisons to N-API and ABI using a single number don't necessarily apply in my mind because ABI is always SemVer major and N-API is only SemVer minor so you only need 1 number to represent them.

boneskull · 2020-01-23T00:00:29Z

as a tooling author consuming these—and I realize I may be the only one—semver seems like an overcomplication; more difficult to parse and of dubious value. if someone wants to argue for semver, I’m happy to listen about what it’ll enable that I’m failing to consider.

if the report format changes in any way (new keys, changed, removed, or even moved since the order is deterministic; changed types of values; consider the entire tree) the version should be bumped.

gireeshpunathil · 2020-01-23T04:03:14Z

I agree with @boneskull . Given that report is a diagnostic feature, IMO modification is all about more data, less data or structural changes, and the main party affected is tools. semver means adding more complexity here.

pinging @june07 (who developed an inspector plugin that understands and renders report https://github.com/june07/NiM ) for an opinion

june07 · 2020-01-23T19:05:01Z

@gireeshpunathil Thank you for including me in the discussion.

While I see the benefit in simplicity vs complexity, I do think that semantic versioning would be the better choice. And given that as software developers we're all very familiar with semver, I'm not even sure I'd categorize using semver as adding to complexity, in fact I think it will help save us from complexity down the road. @boneskull has done a great job with rtk and in fact NiM and related tooling (BrakeCODE) use the rtk library, which demonstrates perfectly the sort of pipelines that do/may exist amongst different software, all which do/may interact with these reports.

In example, JSON fields are currently added to reports that are ingested by BrakeCODE tooling (see screenshot) to encapsulate processing done by the rtk library. While this did not have to be included directly in the report schema, I think doing so is helpful as it eliminates introducing yet another parent data type. But, having an agreed upon standard (semver) of bumping report versions might make things nicer for other potential consumers of the data further down the pipeline, giving better transparency into data schema. NiM is a perfect example, because it is a consumer of reports from BrakeCODE. While adding keys should not equate to breaking changes, it might be nice to have a way to represent those additions via versioning.

it would be a good starting point to know if purely additive changes to the JSON format (i.e. only adding previously non-existent keys) should lead to version bumps.

No with single integer versioning but yes with semver.

I think using a single integer version scheme might be somewhat limiting, non breaking changes being essentially invisible. @mhdawson brings out a good point regarding the Node.js version, having report versioning tightly coupled with Node.js versioning such that report versions are bumped in sync with Node.js is probably bad since changes in one don't necessarily indicate breaking changes in the other. Semver obviously solves this problem. And while more complicated, ultimately we’re all used to the semver paradigm and the complication really only lies in how semver will map to JSON schema changes vs code changes as we’re all very used to, not not much differently I assume.

In working together to agree on the answer to...

it is unclear what that version number means and when it is incremented.

I feel that semver empowers one to answer that question in the most elegant way, while considering more than breaking changes.

gireeshpunathil · 2020-01-24T09:14:09Z

thanks @june07 for the insights! Feel free to join our one of the upcoming diagnostic WG meetings when we take this up. the next one is going to be on 29th, but unlikely to pick this one up (due to pre-decided agenda, pinging @hekike for a confirmation. )

boneskull · 2020-01-24T23:27:08Z

To be clear, my complexity concern is mainly around avoid string-parsing acrobatics or pulling in the userland semver package.

I see semver is useful for developers who can choose what version of something to consume.

For example, it enables automated upgrades, assuming the contract is upheld. Or you may know you don’t want to upgrade to a new major release, because it’s likely to contain breaking changes. It works pretty well most of the time!

But a tool consuming a report file won’t be able to choose the version of said report file; the concept of an “upgrade” doesn’t really apply to the use case.

A tool will change its behavior based on the version of the report, however. Regardless of what a semver version “means” in a report (e.g., minor version means “new field”), the behavior must still be defined based on information that semver cannot provide.

Example:

If we add a report field called foo in a hypothetical version 2.1.0, a tool will read the version of the report file, and if it’s >=2.1.0 the tool must do whatever it’s going to do with foo. If it is <2.1.0 it will not. Note that the semantic version is unable to tell the tool that foo was added. 😉

If v2.1.1 is a bug fix to field bar, a very similar decision will need to be made—assuming the tool cares about the fix—but this time, based on >=2.1.1 / <2.1.1.

Likewise with a breaking change in v3.0.0. Note that the semver conventions ~ and ^ are not currently applicable, and will not be until we have a situation in which the same behavior is broken more than once between different major releases.

OTOH, if you are using incrementing integers for versions, the decision-making is exactly the same as above example... except a tool doesn’t have to parse semver!

Another problem—this is not technical, but of the “people” variety—is that if a tooling author might see semver and think it’s generally okay to restrict the tool’s operation based on the major release number—because that’s how people use semver! For example, only supporting ^2.0.0, when in fact nobody knows if a v3 report will work or not.

In summary, SemVer was designed to give people better control of automated software upgrades. I just can’t see how our use case fits with that aim.

jasnell · 2020-01-25T01:01:16Z

A simplified two digit semver approach works also here. Just major and minor/patch.

richardlau · 2020-01-29T17:07:16Z

Not too keen on a two digit approach. Single digit as @boneskull points out is easier for tools to parse. If we need more digits then semver at least is well defined and has existing parsers that could be reused. If tools were doing their own parsing they could just ignore the patch level (since by definition changes to the patch level should not break them).

boneskull · 2020-01-29T18:02:31Z

@jasnell From your point-of-view, what are the advantages of using semver here?

gireeshpunathil · 2020-02-17T13:15:50Z

We deliberated in the last(-to-last) WG meeting on this.

Report being a data (as opposed to code), patch does not make sense anyways. So the contention is really between x.y semantics versus x semantics.

I propose a single digit versioning with version bump on every structural changes wherein structural change is anything that:

adds a new key
removes a key (though chances are less for this to happen)

It is reasonable to expect report parsers to parse it in JSON-native way, instead of line-parsing, char-parsing etc. One of the main motivation for the report to be in JSON format was easy-parsing for consuming tools. Because of this, section movements / aesthetic modifications do not cause a version bump.

It is reasonable to expect no data type changes being applied to the fields.

mhdawson · 2020-02-17T14:42:03Z

It is reasonable to expect no data type changes being applied to the fields.

Can we depend on that? Maybe we should add a point to your list that says:

changes the data type for a field

gireeshpunathil · 2020-02-17T14:53:11Z

makes sense @mhdawson . so the modified proposal is:

a single digit versioning with version bump on every structural changes wherein structural change is anything that:

adds a new key
removes a key
changes the data type for a field

legendecas · 2020-02-17T16:45:38Z

It is reasonable to expect no data type changes being applied to the fields.

Will a format change of string types be counted as a significant change? Like the format of error stacks.

gireeshpunathil · 2020-02-18T08:24:19Z

Will a format change of string types be counted as a significant change? Like the format of error stacks.

@legendecas - I don't think so, as this sort of change implies no change in the report's functionality, but a change in node core w.r.t data representation.

mmarchini · 2020-02-26T17:43:24Z

On the other hand, if the formatting of a string changes, clients consuming the report could break, so maybe this should be considered a breaking change (thus bumping the number)?

boneskull · 2020-02-26T18:04:23Z

I agree with @mmarchini. @gireeshpunathil said it himself--the report is data, not software. It does not have functionality; it has a format. If that format changes, it should be considered a bumpable change. My preference is that we bump on every change to the format; I don't see the advantage of not bumping.

jasnell · 2020-02-26T18:12:59Z

Changes in format changes absolutely have to be handled as breaking, requiring a version bump.

I'm absolutely fine with a single digit version but that does leave a couple of additional unanswered questions...

(a) Is the version number bumped once per change landed in master?

Let's say the current version is 1, and I land two changes in master in separate commits over the course of 2 weeks. Is the version now 2 or 3?

(b) What is the backport policy?

Let's say that LTS 12.x report version is 3, and master is updated to version 4. Let's say it's an additive change. Are all report changes backportable to 12.x and how do we determine that?

(c) What if a bug is found in an LTS branch that does not exist in master?

Let's say that LTS 12.x report version is 3, and master has had a breaking change that alters the format and bumps the version to 4. Now a bug in the 12.x version 3 report is found and needs to be fixed only in that branch, what does the new version number become?

june07 · 2020-02-26T21:03:18Z

For those so inclined, I think this would be a good read.
https://snowplowanalytics.com/blog/2014/05/13/introducing-schemaver-for-semantic-versioning-of-schemas/

boneskull · 2020-02-26T21:38:58Z

Let me prefix this by saying I really don't want to beat this to death--IMO there's enough analysis paralysis going on in nodejs without this particular issue. So I'm going to decline to comment after this one.

From the snowplow article:

When versioning a data schema, we are concerned with the backwards-compatibility between the new schema and existing data represented in earlier versions of the schema

I am also concerned with this, which is good!

From their definitions:

ADDITION when you make a schema change that is compatible with all historical data

They go on to offer an example of adding an optional field which would satisfy this requirement. Indeed, this would satisfy the requirement in the context of a well-defined JSON schema which allows arbitrary fields.

OTOH, we do not have a well-defined JSON schema for reports (maybe we should), so we cannot make these guarantees. For example, a consumer may base their parsing of a report based on a field count. An addition would change the field count. Because there's no contract--a schema, in this case--which says extra fields are allowed, we can't claim adding a field won't break a consumer.

From the section about REVISION:

REVISION when you make a schema change which may prevent interaction with some historical data

Essentially, this means "we don't know". Wouldn't it be better to err on the side of caution and go with MODEL, which is equivalent to MAJOR?

So if ADDITION is not applicable, and REVISION is dubious, that leaves a single number.

At this point, anything more than a single number seems like overengineering to me. We can always change it later if we need to.

To @jasnell's questions

(a) Is the version number bumped once per change landed in master?

It's probably easier to say "if you're going to land a PR which changes the format, bump the version" than the alternative, which is... I don't know what it is, but it's a PITA.

(b) What is the backport policy?

What's the prior art?

(c) What if a bug is found in an LTS branch that does not exist in master?

What's a "bug" in terms of the diagnostic report format? 😄

jasnell · 2020-02-26T21:48:23Z

What's the prior art?

full-semver, only patch and minors backported.

What's a "bug" in terms of the diagnostic report format?

Incorrect data being generated, for example.

One possible way to deal with this would be to make the report format version relative to the Node.js major, with the counter reset to zero at each major release. For instance... All 12.x related changes: 12.1, 12.2, 12.3 etc ... All 13.x related changes 13.1, 13.2, 13.3.. and so on.

addaleax · 2020-02-26T21:53:30Z

@jasnell I think in that case you could just use the Node.js version number. I don’t think such a format would be helpful, because we will most likely maintain backwards-compatibility up to additive changes in the future, and making it relative to the Node.js version number would make it look like there are breaking changes when there are none.

Honestly, I’d just go with semver at this point. In the worst case, it means that we’re exposing more information about the format than necessary, which seems okay to me.

mhdawson · 2020-02-26T22:47:40Z

I think the key question raised by @jasnell is around backports. If the number is going to be meaningful across versions then we will have to either backport all of the changes up to a certain report version or none (ie if you want to backport the change which results in version 10 you need to include all changes which resulted in 1,2,3,4,5,6,7,8,9. You can't just pick individual changes out of sequence).

If the number is not going to be meaningful across versions then making it relative to the Node.js version might make sense but I'm not sure that is all that useful as @addaleax mentions.

I think if we commit to backporting so that the numbers are meaningful across versions then a single number is probably ok, but I'm also fine with SemVer.

jasnell · 2020-02-26T22:52:59Z

With regards to the backport issue, it's perfectly valid to declare that the diagnostic report format is not subject to the normal semver rules and therefore breaking changes to the diagnostic format can occur in any release, even within an LTS line. This allows backports to be made any time. I would limit this to format changes. Major changes that impact the overall function of the report mechanism (e.g. removing it, changing the command line flags, etc) would fall under normal semver rules.

github-actions · 2022-07-21T00:26:21Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

gireeshpunathil · 2022-07-21T04:23:10Z

I would like to resurrect this from being stale, and would like to move forward with single digit versioning, and on top of that, what @jasnell suggested in #349 (comment)

problem: right now we don't have a rule around versioning, and that needs to be fixed
opportunity: the above proposal gives a reasonable way forward, we can refine it as we go, if need be.

putting it back to diagnostics WG agenda.

gireeshpunathil · 2022-08-23T16:16:28Z

@RafaelGSS - you asked, why can't we attach the containing Node.js version for the report version too? .

here is the issue with that: if there are more than one structural changes in the report that a Node.js version carry, then we cannot differentiate between those.

No9 · 2022-08-24T23:25:07Z

As agreed in the diagnostics meeting I said I would take a look.

To me #349 (comment) from Chris is a good summary.
Specifically the point on there being no schema in the current implementation making the conversation somewhat academic.

With that I'm +1 on the suggestion from @gireeshpunathil to move this forward with a single number as it reflects the current implementation and can be revisited.

Diagnostics report has a version number representing its format, yet its rule is not defined. This doc change specifies the rule. Refs: nodejs/diagnostics#349 Refs: nodejs#28121 (comment)

Diagnostics report has a version number representing its format, yet its rule is not defined. This doc change specifies the rule. Refs: nodejs/diagnostics#349 Refs: #28121 (comment) PR-URL: #45050 Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com> Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Chengzhong Wu <legendecas@gmail.com> Reviewed-By: Michael Dawson <midawson@redhat.com>

gireeshpunathil · 2022-10-21T11:59:49Z

this is done via nodejs/node#45050 , closing.

Diagnostics report has a version number representing its format, yet its rule is not defined. This doc change specifies the rule. Refs: nodejs/diagnostics#349 Refs: #28121 (comment) PR-URL: #45050 Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com> Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Chengzhong Wu <legendecas@gmail.com> Reviewed-By: Michael Dawson <midawson@redhat.com>

addaleax added the diag-agenda label Jan 17, 2020

addaleax mentioned this issue Jan 17, 2020

report: add support for Workers nodejs/node#31386

Closed

4 tasks

mhdawson mentioned this issue Jan 27, 2020

Node.js Foundation Diagnostics WorkGroup Meeting 2020-01-29 #351

Closed

mhdawson mentioned this issue Feb 10, 2020

Node.js Foundation Diagnostics WorkGroup Meeting 2020-02-12 #354

Closed

mhdawson mentioned this issue Feb 24, 2020

Node.js Diagnostics WorkGroup Meeting 2020-02-26 #359

Closed

mhdawson mentioned this issue Mar 9, 2020

Node.js Diagnostics WorkGroup Meeting 2020-03-11 #364

Closed

github-actions bot added the stale label Sep 16, 2020

mmarchini added never stale and removed stale labels Sep 21, 2020

github-actions bot added the stale label Jul 21, 2022

gireeshpunathil added the diag-agenda label Jul 21, 2022

github-actions bot removed the stale label Jul 22, 2022

mhdawson mentioned this issue Jul 25, 2022

Node.js Diagnostics WorkGroup Meeting 2022-07-26 #574

Closed

mhdawson mentioned this issue Aug 8, 2022

Node.js Diagnostics WorkGroup Meeting 2022-08-09 #577

Closed

mhdawson mentioned this issue Aug 22, 2022

Node.js Diagnostics WorkGroup Meeting 2022-08-23 #585

Closed

mhdawson mentioned this issue Sep 5, 2022

Node.js Diagnostics WorkGroup Meeting 2022-09-06 #588

Closed

mhdawson mentioned this issue Sep 19, 2022

Node.js Diagnostics WorkGroup Meeting 2022-09-20 #589

Closed

mhdawson mentioned this issue Oct 3, 2022

Node.js Diagnostics WorkGroup Meeting 2022-10-11 #591

Closed

This was referenced Oct 10, 2022

Node.js Diagnostics WorkGroup Meeting 2022-10-11 #592

Closed

Node.js Diagnostics WorkGroup Meeting 2022-10-18 #594

Closed

gireeshpunathil mentioned this issue Oct 18, 2022

report,doc: define report version semantics nodejs/node#45050

Merged

gireeshpunathil removed the diag-agenda label Oct 21, 2022

gireeshpunathil closed this as completed Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reportVersion semantics are not defined #349

reportVersion semantics are not defined #349

addaleax commented Jan 17, 2020

richardlau commented Jan 17, 2020

mhdawson commented Jan 17, 2020

boneskull commented Jan 23, 2020

gireeshpunathil commented Jan 23, 2020

june07 commented Jan 23, 2020 •

edited

Loading

gireeshpunathil commented Jan 24, 2020

boneskull commented Jan 24, 2020

jasnell commented Jan 25, 2020

richardlau commented Jan 29, 2020

boneskull commented Jan 29, 2020

gireeshpunathil commented Feb 17, 2020

mhdawson commented Feb 17, 2020

gireeshpunathil commented Feb 17, 2020

legendecas commented Feb 17, 2020 •

edited

Loading

gireeshpunathil commented Feb 18, 2020 •

edited

Loading

mmarchini commented Feb 26, 2020

boneskull commented Feb 26, 2020

jasnell commented Feb 26, 2020

june07 commented Feb 26, 2020

boneskull commented Feb 26, 2020

jasnell commented Feb 26, 2020

addaleax commented Feb 26, 2020

mhdawson commented Feb 26, 2020

jasnell commented Feb 26, 2020

github-actions bot commented Jul 21, 2022

gireeshpunathil commented Jul 21, 2022

gireeshpunathil commented Aug 23, 2022

No9 commented Aug 24, 2022

gireeshpunathil commented Oct 21, 2022

reportVersion semantics are not defined #349

reportVersion semantics are not defined #349

Comments

addaleax commented Jan 17, 2020

richardlau commented Jan 17, 2020

mhdawson commented Jan 17, 2020

boneskull commented Jan 23, 2020

gireeshpunathil commented Jan 23, 2020

june07 commented Jan 23, 2020 • edited Loading

gireeshpunathil commented Jan 24, 2020

boneskull commented Jan 24, 2020

jasnell commented Jan 25, 2020

richardlau commented Jan 29, 2020

boneskull commented Jan 29, 2020

gireeshpunathil commented Feb 17, 2020

mhdawson commented Feb 17, 2020

gireeshpunathil commented Feb 17, 2020

legendecas commented Feb 17, 2020 • edited Loading

gireeshpunathil commented Feb 18, 2020 • edited Loading

mmarchini commented Feb 26, 2020

boneskull commented Feb 26, 2020

jasnell commented Feb 26, 2020

june07 commented Feb 26, 2020

boneskull commented Feb 26, 2020

jasnell commented Feb 26, 2020

addaleax commented Feb 26, 2020

mhdawson commented Feb 26, 2020

jasnell commented Feb 26, 2020

github-actions bot commented Jul 21, 2022

gireeshpunathil commented Jul 21, 2022

gireeshpunathil commented Aug 23, 2022

No9 commented Aug 24, 2022

gireeshpunathil commented Oct 21, 2022

june07 commented Jan 23, 2020 •

edited

Loading

legendecas commented Feb 17, 2020 •

edited

Loading

gireeshpunathil commented Feb 18, 2020 •

edited

Loading