Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binlogs Redacting support + Binlogs forward-compatibility reading support #9307

Merged
merged 52 commits into from
Jan 16, 2024

Conversation

JanKrivanek
Copy link
Member

@JanKrivanek JanKrivanek commented Oct 5, 2023

Fixes #9147, #9261 and contributes to #8400

Viewer complement PR: KirillOsenkov/MSBuildStructuredLog#732

Important

This contains all changes from

Context and changes are described in those individual PRs
Feel free to review everything together (this PR), or separately (above 2 PRs) - but not all 3 as you'd be viewing same changes twice

TBD:

  • Tests
    • Simulate the fwd compat scenarios by injecting/removing/altering bytes in memory stream underlying under binlog writer and reader.
    • Test the stream utility classes.
  • localize all strings Done
  • I'm contemplating adding support for recognizing over-reading and mismatched reading during events deserialization Done
  • [TBD for next PR] Usage documentation (when, why and how to handle errors during forward compatible reading)

@JanKrivanek JanKrivanek changed the title Binlogs Redacting support + Binlogs forward reading support Binlogs Redacting support + Binlogs forward-compatibility reading support Oct 5, 2023
Copy link
Contributor

@rokonec rokonec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request change is only for missing that "backward-compatible-with" field.
Otherwise it looks OK. But I have to admit that I have not time to carefully review full extent of this PR.

src/Build.UnitTests/BinaryLogger_Tests.cs Outdated Show resolved Hide resolved
src/Build.UnitTests/BinaryLogger_Tests.cs Show resolved Hide resolved
src/Build/Logging/BinaryLogger/BinaryLogRecordKind.cs Outdated Show resolved Hide resolved
src/Build/Logging/BinaryLogger/BuildEventArgsReader.cs Outdated Show resolved Hide resolved
src/Build/Logging/BinaryLogger/BuildEventArgsReader.cs Outdated Show resolved Hide resolved
@danmoseley
Copy link
Member

Is there a design doc or some overview of what the redactor does - it sounds interesting?
Eg., does it attempt to discover security tokens and suchlike, as Github does? It would only be defense in depth as it can never be perfect.

@JanKrivanek
Copy link
Member Author

Is there a design doc or some overview of what the redactor does - it sounds interesting? Eg., does it attempt to discover security tokens and suchlike, as Github does? It would only be defense in depth as it can never be perfect.

Hi @danmoseley,

Design proposal of redactor is located here: https://github.com/JanKrivanek/MSBuildBinlogRedactor/blob/main/docs/DesignProposal.md (ultimate long term wishful plan is here https://github.com/dotnet/msbuild/blob/main/documentation/specs/proposed/security-metadata.md)
Autodetections are in plan - so far we have just some PoC samples - couple token types detector (https://github.com/JanKrivanek/MSBuildBinlogRedactor/blob/main/src/Microsoft.Build.SensitiveDataDetector/PatternsDetector.cs) and username detector (https://github.com/JanKrivanek/MSBuildBinlogRedactor/blob/main/src/Microsoft.Build.SensitiveDataDetector/UsernameDetector.cs). But @michaelcfanning wants to contribute some more comprehensive logic they have been using in 1ES. Plus I'm trying to get CredScan team to OSS their client side library for textual data classification

Copy link
Member

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left comments inline, mostly nits.

src/Build/CompatibilitySuppressions.xml Show resolved Hide resolved
src/Build/Logging/BinaryLogger/ProjectImportsCollector.cs Outdated Show resolved Hide resolved
src/Build/Logging/BinaryLogger/ProjectImportsCollector.cs Outdated Show resolved Hide resolved
src/Build.UnitTests/BinaryLogger_Tests.cs Outdated Show resolved Hide resolved
src/Build.UnitTests/BinaryLogger_Tests.cs Outdated Show resolved Hide resolved
src/Build/Logging/BinaryLogger/BuildEventArgsReader.cs Outdated Show resolved Hide resolved
src/Build/Logging/BinaryLogger/BuildEventArgsReader.cs Outdated Show resolved Hide resolved
src/Build/Logging/BinaryLogger/BuildEventArgsReader.cs Outdated Show resolved Hide resolved
Copy link
Contributor

@rokonec rokonec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

documentation/wiki/Binary-Log.md Outdated Show resolved Hide resolved
src/Build/Resources/Strings.resx Outdated Show resolved Hide resolved
documentation/wiki/Binary-Log.md Outdated Show resolved Hide resolved
documentation/wiki/Binary-Log.md Outdated Show resolved Hide resolved
documentation/wiki/Binary-Log.md Outdated Show resolved Hide resolved
@KirillOsenkov
Copy link
Member

Sorry I'm late to this review, overall I think things look good. It's high praise from me for a change of this magnitude ;)

A few minor and subjective impressions.

Most importantly, both the previous designs and this current one seem heavy with fine-grained interfaces which I don't think are necessary. Your primary hint that an interface is not necessary is that it only has a single implementation. Another one is that no one is actually using the interface. I haven't dug deep, but at first glance IRawLogEventsSource seems not justified and could be removed. IBuildEventArgsReaderNotifications seems fine. IEmbeddedContentSource seems unnecessary.

I personally don't see a ton of value in localizing these types of exceptions, but since the work is already done, might as well leave it.

@KirillOsenkov
Copy link
Member

One other thought I had that's not necessarily for this PR, but somewhat related. Right now all binlog files start with 1F 8B 08 which is the GZip stream signature bytes (08 indicating Deflate compression). If we change the compression method for the outer stream at some point in the future, the envelope format will likely have a different signature. Long term I'd love to investigate Zstd or Lzma compression, I remember I saw more than 2x improvement in binlog sizes when I uncompressed Gzip/Deflate and recompressed with 7-Zip.

Wondering if we could prepare for that future today by gracefully showing a message if the binlog file doesn't start with 1F 8B 08.

Currently we handle it relatively well:
There was an exception while reading the log file: Found invalid data while decoding. Wondering if there's anything else we can do here, or probably not?

@JanKrivanek
Copy link
Member Author

@KirillOsenkov - thank you for review! (here and in StructuredLogger as well - both requiring significant time from your side)

  • Yes - I have a special feeling for fine grained interfaces. You are the second one to point that out on this PR (in different circumstance, but still the same idea) - so I know where is the problem :-) So I consolidated into the single one.

  • GZip vs other compressions check - good idea! Though this PR feels loaded already. I've spun that off as Binlog reading - grace checking of GZip signature #9567

  • Fwd-compat by default or not - I'm curious of what you'd think about having it opt-in initially, then switch to opt-out after at least on full preview.

@KirillOsenkov
Copy link
Member

Very nice, I made a local change on top of this PR to change the file format version to 19 and added a junk field to the end of ProjectStartedEventArgs, then opened the resulting binlog in the viewer and it was still able to open the binlog and show the error message:

image

@KirillOsenkov
Copy link
Member

I published the viewer update here:
https://github.com/KirillOsenkov/MSBuildStructuredLog/releases/tag/v2.2.155

The NuGet package here:
https://www.nuget.org/packages/MSBuild.StructuredLogger/2.2.155

The binlogtool here:
https://www.nuget.org/packages/binlogtool/1.0.10

Let's proactively test these and maybe send some PRs to Arcade, Roslyn, Maui and a few other repos to get some testing. Then when we do bump the format to 18 and 18 binlogs start to come out we won't break too many people. Both nuget.org and github now can show projects/packages that depend on MSBuild.StructuredLogger NuGet.

@KirillOsenkov
Copy link
Member

binlogtool seems to be working too:

image

@KirillOsenkov
Copy link
Member

Interesting, I forgot to update the reader to read the junk byte I added, and I got a decently nice error message:
image

@KirillOsenkov
Copy link
Member

Is this the behavior we want/expect when we don't opt in MSBuild to be forward compatible by default?

@JanKrivanek
Copy link
Member Author

Is this the behavior we want/expect when we don't opt in MSBuild to be forward compatible by default?

I suppose the junk byte was added without increasing the version - correct? Then it's expected - we still crash, but more gracefully.

@KirillOsenkov
Copy link
Member

No, I increased the version

@JanKrivanek
Copy link
Member Author

No, I increased the version

Thanks @KirillOsenkov - good catch! Not expected indeed! - all the forward compatible flags (allow reading higher version, allow skipping unknown events, allow skiping unknown data in known events) should have same default setting - there was one (allow reading higher version) which was permitted by default - fixed now

@JanKrivanek
Copy link
Member Author

Let's not merge until the linked PRs are merged (to minimize breaking internal teams)

@JanKrivanek JanKrivanek merged commit d53e436 into dotnet:main Jan 16, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binlog redacting - properly process embedded files
7 participants