Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring /errorlog output up to SARIF draft 1.0.0-beta.5 #11233

Merged
merged 11 commits into from
May 17, 2016

Conversation

nguerrera
Copy link
Contributor

@nguerrera nguerrera commented May 11, 2016

Bring /errorlog output up to SARIF draft 1.0.0-beta.5

Also fix #9769

  1. Stop escaping forward slashes (optional per spec and users don't want it)
  2. Add tests converted from DataContractSerializer tests that stress the code we borrowed from there for escaping.
    3 .Let the underlying TextWriter handle all newlines

@dotnet/roslyn-compiler @mavasani @lgolding @michaelcfanning

@@ -45,7 +47,7 @@ public ErrorLogger(Stream stream, string toolName, string toolFileVersion, Versi

private void WriteToolInfo(string name, string fileVersion, Version assemblyVersion)
{
_writer.WriteObjectStart("toolInfo");
_writer.WriteObjectStart("tool");
_writer.Write("name", name);
_writer.Write("version", assemblyVersion.ToString(fieldCount: 3));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention I've followed in the SARIF SDK is to emit the four-part assembly version to the 'version' field. Then emit the three-part version as the 'semanticVersion' field. Then, we emit the assembly file version but only if it differs from the assembly version.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should consider emitting a marker in the sarif file that provides an explicit version for your SARIF producer. in the SDK, we dump a custom property SarifLoggerVersion on the tools property. this isn't necessary, though, if other version information sufficiently identifies this specific point-in-time (perhaps your assembly file version suffices)?

@lgolding, our tolerance for additions to SARIF is very low. you could imagine, though, extending the 'tool' metadata to include a 'sarifLoggerVersion' for tools that integrate some other component for producing SARIF (that may be versioned separately from the root tool itself). i.e., promote the SDK custom property.

Copy link
Contributor Author

@nguerrera nguerrera May 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re 'semanticVersion', that's new, eh? I didn't see it. The way we write it now was conformant to prior spec iteration where 'version' was supposed to be semantic. I'll do it as you suggest.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two issues:

  1. Should there be a toolInfo.loggerVersion property?
  2. If there were, would Nick use it here?

If the answer to #2 is "no", we can defer #1 to future.

So... why did you think Nick would use a loggerVersion property? Your scenario was " tools that integrate some other component for producing SARIF ". But Nick isn't using the SARIF SDK; the logging code is built into Roslyn. Why doesn't the version of the Roslyn compiler suffice?


In reply to: 62950703 [](ancestors = 62950703)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, to close. Nick will update version emitted to conform to my guidance (we have tracking items to update the SARIF spec). we are adding a 'sarifLoggerVersion' member that can be populated if the sarif-producing component is versioned separately from the dependent tool. Nick doesn't need to be concerned about populating this item, as the Roslyn version uniquely identifies the sarif-producing code.

@nguerrera nguerrera changed the title Bring /errorlog output up to SARIF draft 1.0.0-beta.4 [WIP] Bring /errorlog output up to SARIF draft 1.0.0-beta.4 May 12, 2016
@@ -35,8 +37,8 @@ public ErrorLogger(Stream stream, string toolName, string toolFileVersion, Versi
_writer.WriteObjectStart(); // root
_writer.Write("version", OutputFormatVersion);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you emit a $schema property, your emitted SARIF will automatically validate when opened in VS or VS Code.

"$schema": "http://json.schemastore.org/sarif-1.0.0"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can emit this, but I'll wait until that URL does not have anything beta in it and it actually has the final schema. If we ship U3 against a spec still in beta then beta -> RTM changes will cause U3 logs to fail to validate. I know the plan is not for that to happen, but I'll be conservative and add $schema only when we change 'version' to '1.0.0' in the output.

@nguerrera
Copy link
Contributor Author

I pushed a new commit with rule metadata implemented. There are two tests failing but I need to run soon and wanted to show active reviewers latest code.I will fix the tests and the PR feedback in the morning.

_writer.Write("shortMessage", message);
_writer.Write("fullMessage", description);
_writer.WriteArrayStart("suppressionStates");
_writer.Write("suppressedInSource");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelfanning, This is correct according to the schema as it stands today but we talked about splitting SuppressionState and BaselineState apart, and I see we never did that. @nguerrera, this is fine for now but there will be a small change. I'll file an issue in the SDK to take care of this.

Copy link

@ghost ghost May 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Red alert! Two things:

  1. @michaelcfanning: The schema correctly shows the values of the SuppressionStates enumeration as "suppressed", "suppressedInSource". But the code gen hints file gives the enumeration constant names as "SuppressedInSource", "SuppressedInBaseline", so the generated file SuppressionStates.cs is wrong. I'll fix that.
  2. @nguerrera: The interpretation of the SuppressionStates enum is subtle. Every result that is suppressed, for whatever reason, should have the value "suppressed" in the "suppressionStates" array. Those results that are suppressed because of an in-source suppression should also have the value "suppressedInSource" in the array. I can explain offline the reasoning behind that.

So, since in Roslyn every suppression you see is a source-level suppression (that's right, isn't it?) your code should be

            _writer.WriteArrayStart("suppressionStates");
            _writer.Write("suppressed");
            _writer.Write("suppressedInSource");

... resulting in

            "suppressionStates": [ "suppressed", "suppressedInSource" ]

In reply to: 63035326 [](ancestors = 63035326)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I see in http://json.schemastore.org/sarifright now. Is it not up to date with what is supposed to be done here?


        "suppressionStates": {
          "type": "array",
          "items": {
            "description": "A flag value indicating one or more suppression conditions.",
            "enum": [
              "suppressedInSource",
              "suppressedInBaseline"
            ]
          }
        },

Copy link
Contributor Author

@nguerrera nguerrera May 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I found bleeding edge in https://github.com/Microsoft/sarif-sdk/blob/master/src/Sarif/Schemata/Sarif.schema.json and will adjust to it. Is it wise that we are keeping beta.4 version across breaking changes? Isn't this exactly what Ryley was plagued by?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not. We should have changed to beta.5 when we made that change.


In reply to: 63069504 [](ancestors = 63069504)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not too late since the latest schema hasn't been published. Should I bump to beta.5 in next commit where I do this and adjust tags to be under properties?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'm working on the commit now.


In reply to: 63072622 [](ancestors = 63072622)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline, we decided it should remain [ "suppressedInSource" ]

@ghost
Copy link

ghost commented May 12, 2016

:shipit:

@ghost
Copy link

ghost commented May 12, 2016

👍 subject to the [ "suppressed", "suppressedInSource" ] change.

@mavasani
Copy link
Contributor

This will probably break errorlog VSI test

@mavasani
Copy link
Contributor

test vsi please

@nguerrera
Copy link
Contributor Author

@mavasani where do the relevant vsi tests live and hoiw do I run them?

@mavasani
Copy link
Contributor

@nguerrera The tests live in the roslyn-internal repo. We have a test for diagnostics and FixAll occurrences, both of which verify expected diagnostics against checked in error log baselines. You will basically need to generate new error log baselines by manually running the test scenario. We can do it together, I'll drop by after standup.


public ErrorLogger(Stream stream, string toolName, string toolFileVersion, Version toolAssemblyVersion)
public ErrorLogger(Stream stream, string toolName, string toolFileVersion, Version toolAssemblyVersion, CultureInfo culture)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably just add a separate constructor - even though this is not public API, I know of a bunch of tools that use reflection to instantiate an error logger. We can avoid breaking them..

@nguerrera nguerrera force-pushed the sarif-update branch 2 times, most recently from 51fb515 to 9f5a4ff Compare May 12, 2016 21:15
@nguerrera nguerrera changed the title [WIP] Bring /errorlog output up to SARIF draft 1.0.0-beta.4 [WIP] Bring /errorlog output up to SARIF draft 1.0.0-beta.5 May 12, 2016
@nguerrera nguerrera changed the title [WIP] Bring /errorlog output up to SARIF draft 1.0.0-beta.5 Bring /errorlog output up to SARIF draft 1.0.0-beta.5 May 12, 2016
@nguerrera
Copy link
Contributor Author

nguerrera commented May 12, 2016

@lgolding @michaelcfanning Everything should be up to date with discussion this afternoon and pending beta.5. I've rebased my other active PR (and closed it) in to this one since I needed the escaping fix to test the $schema value that is now written.

Not ready to merge as I still need to:

  • Add some test coverage for additional locations, multiple descriptors with same ID.
  • Refine the DescriptorSet to handle a pathological case I thought about last night (e.g. one descriptor has ID XXX.0001 and two others have XXX)
  • Work with @mavasani to get vsi tests rebaselined.

JSON spec makes escaping of forward slashes optional and many have
complained about the aesthetics of '\/'. Previously, we got this
behavior from the DataContractJsonSerializer and could not override
it, but now that the dependency has been removed, we can eliminate
the unnecessary escaping.
Also, defer to TextWriter for NewLine character rather than
hard-coding Environment.NewLine.
Also remove outdated future improvement comment
@nguerrera
Copy link
Contributor Author

I believe all feedback has been addressed, except I still need @mavasani's help to rebaseline vsi tests.

@jmarolf
Copy link
Contributor

jmarolf commented May 15, 2016

test vsi please

@mavasani
Copy link
Contributor

@nguerrera I am going to skip the error log based tests to allow your checkin through, then regenerate the baseline, checkin a text file with steps to regenerate it in future and also modify the error log tao actions to throw an appropriate error pointing out the need for baseline update.

@nguerrera
Copy link
Contributor Author

@mavasani OK. Thanks!

@mavasani
Copy link
Contributor

@dotnet-bot retest roslyn_prtest_win_vsi2 please

@mavasani
Copy link
Contributor

test vsi please

@mavasani
Copy link
Contributor

@nguerrera Tests are now all green - I'll fix and unskip them once your change is merged in.

@nguerrera
Copy link
Contributor Author

Reviewers: is this ready to go? I need to get this in today. @lgolding @mavasani @jaredpar

@ghost
Copy link

ghost commented May 16, 2016

+1 from me. I didn't have any real feedback in the first place.


Key | Value
------------------------ | ------------
"severity" | `Diagnostic.Severity` ("Hidden", "Info", "Warning, or "Error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we getting rid of severity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything removed has been replaced with first class SARIF equivalents.

@mavasani
Copy link
Contributor

LGTM

@mavasani
Copy link
Contributor

@nguerrera Are you planning to check this in today?

@nguerrera
Copy link
Contributor Author

@mavasani Yes. I've been waiting on at least one compiler team reviewer. @agocke @jcouv

}

public void Write(int value)
{
WritePending();
_output.Write(value);
_pending = s_commaNewLine;
_output.Write(value.ToString(CultureInfo.InvariantCulture));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this spec'd somewhere? The invariant culture is pretty strange and not common outside of .NET

@agocke
Copy link
Member

agocke commented May 17, 2016

LGTM aside from invariant culture. It looks like you're only using it for integers and characters, so it should probably be safe, but if we ever handle something like dates I would prefer ISO 8601 or a similar standard.

@ghost
Copy link

ghost commented May 17, 2016

@agocke The log files are intended to be machine-readable so use of InvariantCulture is appropriate here.

As for dates, the SARIF standard specifies a particular ISO 8601 date format (link).

@nguerrera Please be sure you are emitting this format if you are populating any of the following properties:

  • invocation.startTime
  • invocation.endTime
  • notification.time

Those are all optional properties, and you're probably not populating any of them, especially if you wanted the log file to be deterministic.

@nguerrera
Copy link
Contributor Author

@agocke. Thanks. CultureInfo.InvariantCulture is used exclusively for formatting integers. That part of the change is defending against a CurrentCulture that would not format integers compatibly with JSON.

"8 Numbers" in http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf defines the number format and I added tests coverage: https://github.com/dotnet/roslyn/pull/11233/files#diff-5bdd83988167258d8851cb9093c1cf25R32

@lgolding I don't emit any of those at this time.

@nguerrera nguerrera merged commit b280c27 into dotnet:master May 17, 2016
@nguerrera nguerrera deleted the sarif-update branch May 17, 2016 17:23
@mavasani
Copy link
Contributor

mavasani commented May 17, 2016

@nguerrera I am trying to update the error log baseline and error log tao action that parses the error log file and I see that the error log has duplicate rules metadata entries like below:

        "SA1003": {
          "id": "SA1003",
          "shortDescription": "Symbols must be spaced correctly",
          "fullDescription": "The spacing around an operator symbol is incorrect, within a C# code file.",
          "defaultLevel": "warning",
          "helpUri": "https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/master/documentation/SA1003.md",
          "properties": {
            "category": "StyleCop.CSharp.SpacingRules",
            "isEnabledByDefault": true
          }
        },
        "SA1003.001": {
          "id": "SA1003",
          "shortDescription": "Symbols must be spaced correctly",
          "fullDescription": "The spacing around an operator symbol is incorrect, within a C# code file.",
          "defaultLevel": "warning",
          "helpUri": "https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/master/documentation/SA1003.md",
          "properties": {
            "category": "StyleCop.CSharp.SpacingRules",
            "isEnabledByDefault": true
          }
        }

This causes a "ruleKey" entry to be emitted for individual diagnostics which is causing intermittent failure. For now, I am going to work around this and file a separate bug for this issue.

private Dictionary<string, int> _counters = new Dictionary<string, int>();

// DiagnosticDescriptor -> unique key
private Dictionary<DiagnosticDescriptor, string> _keys = new Dictionary<DiagnosticDescriptor, string>();
Copy link
Contributor

@mavasani mavasani May 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to use a DiagnosticDescriptor comparer that ensures equivalent descriptors which are not reference equals don't cause duplicate kvps.

Copy link
Contributor Author

@nguerrera nguerrera May 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiagnosticDescriptor is IEquatable<DiagnosticDescriptor> by value, so that's why I thought default comparer would work. However, I see now that common case of unequal descriptors with same ID is when they differ only by message format and we don't write out the message format. So I do need a comparer that only compares the parts we write out. On it.

@mavasani
Copy link
Contributor

@nguerrera I have filed #11383 and assigned it to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve JSON string escaping for error log
7 participants