[v3] Improve comment handling #684

felixfontein · 2020-12-30T12:45:24Z

Fixes two problems with comment handling in v3 branch found while working on getsops/sops#791:

When unmarshaling and re-marshaling
```
a:
    b:
        # comment followed by newline

        c: d
```
the comment jumps under c: d. (The new line is essential, without it it won't happen.)
When unmarshaling and re-marshaling
```
# foo
```
This parses to a zero node (the comment is lost).
When unmarshaling
```
# foo
---
key: value
```
we end up with only one document containing both the comment and the hash.

The fix for 2 changes the parsing of "empty" documents which contain comments. Instead of returning nil, they now return a document node with no content.

I've added some unmarshal + re-marshal tests in yaml_test.go.

felixfontein · 2020-12-30T13:36:03Z

I had to revise my fix for 2, it now doesn't break anything else (that I know of). I've updated the description accordingly.

felixfontein · 2020-12-30T14:24:06Z

I currently found some more problems and am looking at them.

felixfontein · 2020-12-30T15:33:26Z

I think this is now ready.

felixfontein · 2020-12-30T15:51:18Z

(I did sign the contributor agreement at https://ubuntu.com/legal/contributors and put @niemeyer as a reference.)

niemeyer

Thanks for these proposed changes. Here is an initial review. Please let me know how you feel about these points.

niemeyer · 2021-01-05T11:15:08Z

parserc.go

@@ -890,6 +893,14 @@ func yaml_parser_parse_block_mapping_value(parser *yaml_parser_t, event *yaml_ev
 	if token.typ == yaml_VALUE_TOKEN {
 		mark := token.end_mark
 		skip_token(parser)
+		// Move foot comment to head comment


I can see that's being done below, but I cannot see why this is being done. This comment is a nice place to describe that, as this is not an obvious operation.

Improved in 193064d.

niemeyer · 2021-01-05T11:19:30Z

decode.go

@@ -156,6 +156,11 @@ func (p *parser) parse() *Node {
 		return p.document()
 	case yaml_STREAM_END_EVENT:
 		// Happens when attempting to decode an empty buffer.
+		if toplevel && len(p.event.head_comment) > 0 {


Either the pre-existing comment above is incorrect, or we don't need the new variable. Also, it sounds like we should have enough state available in the parser to tell whether this is a top-level value or not. Can you provide some more details about what's going on here?

I don't know / understand the parser well enough to be sure whether the comment is correct or not. That's why I added the parameter toplevel, to avoid accidentally introducing a bug here. If you prefer I can remove it.

That's okay, but we cannot have additional logic in here unless we understand that well, otherwise we may be surprised in other situations, or may have misleading logic that makes it sound like this code executes at times it doesn't. Right now this is labeled as a "stream end event". Either that's true and the new logic is wrong, or it's incorrect and the old comment is wrong.

niemeyer · 2021-01-05T11:24:09Z

decode.go

+		n.Column = 1
+		p.event.head_comment = nil
+		return n
+	}
 	n := p.node(DocumentNode, "", "", "")
 	p.doc = n


Line 200 above is inside the if, and the following line should be there as well so that the behavior for the call-site remains the same. Positioning the if here would fix both.

Fixed in 193064d.

niemeyer · 2021-01-05T11:25:52Z

decode.go

+	if !p.event.implicit && len(p.event.head_comment) > 0 {
+		// This comment belongs to **before** the document event
+		n := p.node(DocumentNode, "", "", "")
+		// (1, 1) is not fully correct, but close enough


Let's please just drop the comment here. Reading this code is clear that the lines are made up, and that's fine because the line is for the DocumentNode, not for the comment. Since the document is empty, there's nothing clearly wrong with this.

Fixed in 193064d.

niemeyer · 2021-01-05T11:30:13Z

emitterc.go

@@ -450,12 +451,16 @@ func yaml_emitter_emit_document_start(emitter *yaml_emitter_t, event *yaml_event
 			if !yaml_emitter_process_head_comment(emitter) {
 				return false
 			}
-			if !put_break(emitter) {
+			// Do not add newline if the document is empty


This is just re-stating what we can read nicely in the line below.

Removed in 193064d.

niemeyer · 2021-01-05T11:32:31Z

yaml_test.go

@@ -0,0 +1,171 @@
+package yaml_test


We have a very nice set of tests for comments in the node_test.go file. These need to be integrated there as well, and the ones unrelated to comments can go into the respective test files.

I added most of the tests from this file (all single-document tests) to node_test.go as well.

(02f3190)

Thanks, but we need to have all tests either in node_test.go or in the respective test files, using existing conventions. We have plenty of tests and test files already.

Also, let's please duplicate the exact same test in two different places. Those that are taken care by node_test.go don't need to exist again elsewhere.

niemeyer · 2021-01-05T11:33:23Z

yaml_test.go

+
+
+func (s *S) TestCommentEmptyDoc(c *C) {
+	testIdempotent(c, `# foo


That sort of test looks better as "#foo\n", etc.

Fixed in 193064d.

niemeyer

Thanks for the tuning. I'm providing some additional comments on the PR below, but I also slept thinking of the underlying issue, and this feels like a partial solution to a more general problem.

Please hold on a moment... I'll provide a follow up comment after this review, and then will provide a candidate fix for the problem that maybe addresses the issue more generally.

niemeyer · 2021-01-06T09:33:39Z

decode.go

@@ -156,6 +156,11 @@ func (p *parser) parse() *Node {
 		return p.document()
 	case yaml_STREAM_END_EVENT:
 		// Happens when attempting to decode an empty buffer.
+		if toplevel && len(p.event.head_comment) > 0 {


That's okay, but we cannot have additional logic in here unless we understand that well, otherwise we may be surprised in other situations, or may have misleading logic that makes it sound like this code executes at times it doesn't. Right now this is labeled as a "stream end event". Either that's true and the new logic is wrong, or it's incorrect and the old comment is wrong.

niemeyer · 2021-01-06T09:43:11Z

emitterc.go

@@ -359,6 +359,7 @@ func yaml_emitter_emit_stream_start(emitter *yaml_emitter_t, event *yaml_event_t
 func yaml_emitter_emit_document_start(emitter *yaml_emitter_t, event *yaml_event_t, first bool) bool {

 	if event.typ == yaml_DOCUMENT_START_EVENT {
+		isEmpty := emitter.events[emitter.events_head + 1].typ == yaml_DOCUMENT_END_EVENT


How do we know here that the +1 is safe?

niemeyer · 2021-01-06T09:46:23Z

yaml_test.go

@@ -0,0 +1,171 @@
+package yaml_test


Thanks, but we need to have all tests either in node_test.go or in the respective test files, using existing conventions. We have plenty of tests and test files already.

Also, let's please duplicate the exact same test in two different places. Those that are taken care by node_test.go don't need to exist again elsewhere.

niemeyer · 2021-01-06T16:21:08Z

parserc.go

+		//
+		//         c: d
+		//
+		// (the newline between the comment and the next line is essential!)


This is stated more succinctly in the body of the yaml right above, so no need to repeat here.

Otherwise, thanks for the comment. It provides clear rationale for the non-obvious logic.

niemeyer · 2021-01-06T16:39:36Z

Okay, so this PR is doing two independent things, so the first thing I'd ask is let's please split those independent things apart, so that we can move both of them forward at their own pace.

Issue one

So, first for the more tricky part of this PR: the current logic considers the first line after an arbitrary node to be a foot comment, as long as there's nothing further below it with no spacing that would justify it being its header. That logic fails when indentation visually associates the comment with the following data instead.

Your approach to fix this issue is a viable one: you're transforming what was a footer into a header. While this works, it special cases one particular instance in which we know this can happen, but there might be others that we need to look into. A more general approach is to fix the cause of the issue by not considering such cases as a foot comment in the first place.

That's what I've done in 3778e11. Please let me know how you feel about it, and whether it properly addresses cases you have at hand.

Issue two

The second issue regards the handling of empty documents. That looks like you have a mostly clean implementation, except for the handling of the stream end event that is a bit uncertain per the comments in the reviews above.

Would you mind to split that apart, and also address the issues raised in terms of testing in the reviews?

Please let me know how you feel about these comments.

When a comment is indented, it may be visually associated as a header of the following line instead of being a footer of the former, even if there is a line break. For example: key: # Comment - Underlying value

niemeyer · 2021-01-06T17:32:56Z

I tuned it further to fix @mikefarah's #686 as well. This now works a bit closer to your PR since it looks at the token type, so I'm hoping that no use cases are left behind. I'm going to close this PR on that basis, but please me know if that's incorrect.

Also, we still need the empty document part of this PR. Let's please move this to a new PR so we can track that by itself.

* Add another test (that currently fails). * First shot at using yaml.v3 for reading YAML files with comments. * Allow parsing multi-document YAML files. * Use Decoder to parse multi-part documents. * Use yaml.v3 for config and audit. * First step of serializing YAML using yaml.v3. * Always serialize with yaml.v3. * Remove debug prints. * Remove traces of github.com/mozilla-services/yaml. * Improve serialization of documents consisting only of comments. * Improve handling of some empty documents. * Adjust to latest changes in go-yaml/yaml#684. * Bump yaml.v3 version, temporarily disable failing tests. * Run go mod tidy. * Fix CI.

felixfontein added 4 commits December 29, 2020 21:18

Add idempotency tests.

52a3704

Add tests that fail.

d43cefb

Dump tree to stdout to ease debugging.

05a4f38

Fix moving comment.

6abbb6e

felixfontein mentioned this pull request Dec 30, 2020

Use yaml.v3 instead of modified yaml.v2 for handling YAML files getsops/sops#791

Merged

felixfontein force-pushed the improve-comment-handling branch from 329f998 to f2a3310 Compare December 30, 2020 13:33

Improve empty document handling.

d2b2a0b

felixfontein force-pushed the improve-comment-handling branch from f2a3310 to d2b2a0b Compare December 30, 2020 13:58

felixfontein changed the title ~~[v3] Improve comment handling~~ [WIP] [v3] Improve comment handling Dec 30, 2020

felixfontein force-pushed the improve-comment-handling branch from 15dce78 to 5612908 Compare December 30, 2020 15:04

Prevent comment document crawling.

7b36ba5

felixfontein force-pushed the improve-comment-handling branch from 5612908 to 7b36ba5 Compare December 30, 2020 15:04

Document with only comments should not have newline following.

46fa0f1

felixfontein added a commit to felixfontein/sops that referenced this pull request Dec 30, 2020

Adjust to latest changes in go-yaml/yaml#684.

c1a24a9

felixfontein mentioned this pull request Dec 30, 2020

Do not discard comments at beginning of YAML documents getsops/sops#757

Merged

felixfontein changed the title ~~[WIP] [v3] Improve comment handling~~ [v3] Improve comment handling Dec 30, 2020

niemeyer reviewed Jan 5, 2021

View reviewed changes

felixfontein added 2 commits January 5, 2021 18:03

Apply some review suggestions.

193064d

Add single-document tests from yaml_test to node_test.

02f3190

niemeyer requested changes Jan 6, 2021

View reviewed changes

niemeyer closed this Jan 6, 2021

felixfontein added a commit to felixfontein/sops that referenced this pull request Jan 6, 2021

Adjust to latest changes in go-yaml/yaml#684.

d02620f

felixfontein mentioned this pull request Jan 7, 2021

[v3] Improve empty document handling #690

Open

felixfontein deleted the improve-comment-handling branch January 7, 2021 20:04

felixfontein added a commit to felixfontein/sops that referenced this pull request Feb 4, 2021

Adjust to latest changes in go-yaml/yaml#684.

e4f8b87

felixfontein added a commit to felixfontein/sops that referenced this pull request Feb 21, 2021

Adjust to latest changes in go-yaml/yaml#684.

a4f9d11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v3] Improve comment handling #684

[v3] Improve comment handling #684

felixfontein commented Dec 30, 2020 •

edited

Loading

felixfontein commented Dec 30, 2020

felixfontein commented Dec 30, 2020

felixfontein commented Dec 30, 2020

felixfontein commented Dec 30, 2020

niemeyer left a comment

niemeyer Jan 5, 2021

felixfontein Jan 5, 2021

niemeyer Jan 5, 2021

felixfontein Jan 5, 2021

niemeyer Jan 6, 2021

niemeyer Jan 5, 2021

felixfontein Jan 5, 2021

niemeyer Jan 5, 2021

felixfontein Jan 5, 2021

niemeyer Jan 5, 2021

felixfontein Jan 5, 2021

niemeyer Jan 5, 2021

felixfontein Jan 5, 2021

felixfontein Jan 5, 2021

niemeyer Jan 6, 2021

niemeyer Jan 5, 2021

felixfontein Jan 5, 2021

niemeyer left a comment

niemeyer Jan 6, 2021

niemeyer Jan 6, 2021

niemeyer Jan 6, 2021

niemeyer Jan 6, 2021

niemeyer commented Jan 6, 2021

niemeyer commented Jan 6, 2021



		func (s S) TestCommentEmptyDoc(c C) {
		testIdempotent(c, `# foo

[v3] Improve comment handling #684

[v3] Improve comment handling #684

Conversation

felixfontein commented Dec 30, 2020 • edited Loading

felixfontein commented Dec 30, 2020

felixfontein commented Dec 30, 2020

felixfontein commented Dec 30, 2020

felixfontein commented Dec 30, 2020

niemeyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niemeyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niemeyer commented Jan 6, 2021

Issue one

Issue two

niemeyer commented Jan 6, 2021

felixfontein commented Dec 30, 2020 •

edited

Loading