Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v3] Improve comment handling #684

Closed

Conversation

felixfontein
Copy link

@felixfontein felixfontein commented Dec 30, 2020

Fixes two problems with comment handling in v3 branch found while working on getsops/sops#791:

  1. When unmarshaling and re-marshaling

    a:
        b:
            # comment followed by newline
    
            c: d

    the comment jumps under c: d. (The new line is essential, without it it won't happen.)

  2. When unmarshaling and re-marshaling

    # foo

    This parses to a zero node (the comment is lost).

  3. When unmarshaling

    # foo
    ---
    key: value

    we end up with only one document containing both the comment and the hash.

The fix for 2 changes the parsing of "empty" documents which contain comments. Instead of returning nil, they now return a document node with no content.

I've added some unmarshal + re-marshal tests in yaml_test.go.

@felixfontein
Copy link
Author

I had to revise my fix for 2, it now doesn't break anything else (that I know of). I've updated the description accordingly.

@felixfontein felixfontein force-pushed the improve-comment-handling branch from f2a3310 to d2b2a0b Compare December 30, 2020 13:58
@felixfontein felixfontein changed the title [v3] Improve comment handling [WIP] [v3] Improve comment handling Dec 30, 2020
@felixfontein
Copy link
Author

I currently found some more problems and am looking at them.

@felixfontein felixfontein force-pushed the improve-comment-handling branch from 15dce78 to 5612908 Compare December 30, 2020 15:04
@felixfontein felixfontein force-pushed the improve-comment-handling branch from 5612908 to 7b36ba5 Compare December 30, 2020 15:04
felixfontein added a commit to felixfontein/sops that referenced this pull request Dec 30, 2020
@felixfontein felixfontein changed the title [WIP] [v3] Improve comment handling [v3] Improve comment handling Dec 30, 2020
@felixfontein
Copy link
Author

I think this is now ready.

@felixfontein
Copy link
Author

(I did sign the contributor agreement at https://ubuntu.com/legal/contributors and put @niemeyer as a reference.)

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these proposed changes. Here is an initial review. Please let me know how you feel about these points.

parserc.go Outdated
@@ -890,6 +893,14 @@ func yaml_parser_parse_block_mapping_value(parser *yaml_parser_t, event *yaml_ev
if token.typ == yaml_VALUE_TOKEN {
mark := token.end_mark
skip_token(parser)
// Move foot comment to head comment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that's being done below, but I cannot see why this is being done. This comment is a nice place to describe that, as this is not an obvious operation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved in 193064d.

@@ -156,6 +156,11 @@ func (p *parser) parse() *Node {
return p.document()
case yaml_STREAM_END_EVENT:
// Happens when attempting to decode an empty buffer.
if toplevel && len(p.event.head_comment) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either the pre-existing comment above is incorrect, or we don't need the new variable. Also, it sounds like we should have enough state available in the parser to tell whether this is a top-level value or not. Can you provide some more details about what's going on here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know / understand the parser well enough to be sure whether the comment is correct or not. That's why I added the parameter toplevel, to avoid accidentally introducing a bug here. If you prefer I can remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's okay, but we cannot have additional logic in here unless we understand that well, otherwise we may be surprised in other situations, or may have misleading logic that makes it sound like this code executes at times it doesn't. Right now this is labeled as a "stream end event". Either that's true and the new logic is wrong, or it's incorrect and the old comment is wrong.

decode.go Outdated
n.Column = 1
p.event.head_comment = nil
return n
}
n := p.node(DocumentNode, "", "", "")
p.doc = n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 200 above is inside the if, and the following line should be there as well so that the behavior for the call-site remains the same. Positioning the if here would fix both.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 193064d.

decode.go Outdated
if !p.event.implicit && len(p.event.head_comment) > 0 {
// This comment belongs to **before** the document event
n := p.node(DocumentNode, "", "", "")
// (1, 1) is not fully correct, but close enough
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please just drop the comment here. Reading this code is clear that the lines are made up, and that's fine because the line is for the DocumentNode, not for the comment. Since the document is empty, there's nothing clearly wrong with this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 193064d.

emitterc.go Outdated
@@ -450,12 +451,16 @@ func yaml_emitter_emit_document_start(emitter *yaml_emitter_t, event *yaml_event
if !yaml_emitter_process_head_comment(emitter) {
return false
}
if !put_break(emitter) {
// Do not add newline if the document is empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just re-stating what we can read nicely in the line below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 193064d.

@@ -0,0 +1,171 @@
package yaml_test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a very nice set of tests for comments in the node_test.go file. These need to be integrated there as well, and the ones unrelated to comments can go into the respective test files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added most of the tests from this file (all single-document tests) to node_test.go as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but we need to have all tests either in node_test.go or in the respective test files, using existing conventions. We have plenty of tests and test files already.

Also, let's please duplicate the exact same test in two different places. Those that are taken care by node_test.go don't need to exist again elsewhere.

yaml_test.go Outdated


func (s *S) TestCommentEmptyDoc(c *C) {
testIdempotent(c, `# foo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sort of test looks better as "#foo\n", etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 193064d.

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tuning. I'm providing some additional comments on the PR below, but I also slept thinking of the underlying issue, and this feels like a partial solution to a more general problem.

Please hold on a moment... I'll provide a follow up comment after this review, and then will provide a candidate fix for the problem that maybe addresses the issue more generally.

@@ -156,6 +156,11 @@ func (p *parser) parse() *Node {
return p.document()
case yaml_STREAM_END_EVENT:
// Happens when attempting to decode an empty buffer.
if toplevel && len(p.event.head_comment) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's okay, but we cannot have additional logic in here unless we understand that well, otherwise we may be surprised in other situations, or may have misleading logic that makes it sound like this code executes at times it doesn't. Right now this is labeled as a "stream end event". Either that's true and the new logic is wrong, or it's incorrect and the old comment is wrong.

@@ -359,6 +359,7 @@ func yaml_emitter_emit_stream_start(emitter *yaml_emitter_t, event *yaml_event_t
func yaml_emitter_emit_document_start(emitter *yaml_emitter_t, event *yaml_event_t, first bool) bool {

if event.typ == yaml_DOCUMENT_START_EVENT {
isEmpty := emitter.events[emitter.events_head + 1].typ == yaml_DOCUMENT_END_EVENT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know here that the +1 is safe?

@@ -0,0 +1,171 @@
package yaml_test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but we need to have all tests either in node_test.go or in the respective test files, using existing conventions. We have plenty of tests and test files already.

Also, let's please duplicate the exact same test in two different places. Those that are taken care by node_test.go don't need to exist again elsewhere.

//
// c: d
//
// (the newline between the comment and the next line is essential!)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is stated more succinctly in the body of the yaml right above, so no need to repeat here.

Otherwise, thanks for the comment. It provides clear rationale for the non-obvious logic.

@niemeyer
Copy link
Contributor

niemeyer commented Jan 6, 2021

Okay, so this PR is doing two independent things, so the first thing I'd ask is let's please split those independent things apart, so that we can move both of them forward at their own pace.

Issue one

So, first for the more tricky part of this PR: the current logic considers the first line after an arbitrary node to be a foot comment, as long as there's nothing further below it with no spacing that would justify it being its header. That logic fails when indentation visually associates the comment with the following data instead.

Your approach to fix this issue is a viable one: you're transforming what was a footer into a header. While this works, it special cases one particular instance in which we know this can happen, but there might be others that we need to look into. A more general approach is to fix the cause of the issue by not considering such cases as a foot comment in the first place.

That's what I've done in 3778e11. Please let me know how you feel about it, and whether it properly addresses cases you have at hand.

Issue two

The second issue regards the handling of empty documents. That looks like you have a mostly clean implementation, except for the handling of the stream end event that is a bit uncertain per the comments in the reviews above.

Would you mind to split that apart, and also address the issues raised in terms of testing in the reviews?

Please let me know how you feel about these comments.

niemeyer added a commit that referenced this pull request Jan 6, 2021
When a comment is indented, it may be visually associated as a header
of the following line instead of being a footer of the former, even
if there is a line break. For example:

key:
    # Comment

    - Underlying value
@niemeyer
Copy link
Contributor

niemeyer commented Jan 6, 2021

I tuned it further to fix @mikefarah's #686 as well. This now works a bit closer to your PR since it looks at the token type, so I'm hoping that no use cases are left behind. I'm going to close this PR on that basis, but please me know if that's incorrect.

Also, we still need the empty document part of this PR. Let's please move this to a new PR so we can track that by itself.

@niemeyer niemeyer closed this Jan 6, 2021
felixfontein added a commit to felixfontein/sops that referenced this pull request Jan 6, 2021
@felixfontein felixfontein deleted the improve-comment-handling branch January 7, 2021 20:04
felixfontein added a commit to felixfontein/sops that referenced this pull request Feb 4, 2021
felixfontein added a commit to felixfontein/sops that referenced this pull request Feb 21, 2021
autrilla pushed a commit to getsops/sops that referenced this pull request Feb 21, 2021
* Add another test (that currently fails).

* First shot at using yaml.v3 for reading YAML files with comments.

* Allow parsing multi-document YAML files.

* Use Decoder to parse multi-part documents.

* Use yaml.v3 for config and audit.

* First step of serializing YAML using yaml.v3.

* Always serialize with yaml.v3.

* Remove debug prints.

* Remove traces of github.com/mozilla-services/yaml.

* Improve serialization of documents consisting only of comments.

* Improve handling of some empty documents.

* Adjust to latest changes in go-yaml/yaml#684.

* Bump yaml.v3 version, temporarily disable failing tests.

* Run go mod tidy.

* Fix CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants