-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v3] Improve comment handling #684
Conversation
329f998
to
f2a3310
Compare
I had to revise my fix for 2, it now doesn't break anything else (that I know of). I've updated the description accordingly. |
f2a3310
to
d2b2a0b
Compare
I currently found some more problems and am looking at them. |
15dce78
to
5612908
Compare
5612908
to
7b36ba5
Compare
I think this is now ready. |
(I did sign the contributor agreement at https://ubuntu.com/legal/contributors and put @niemeyer as a reference.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these proposed changes. Here is an initial review. Please let me know how you feel about these points.
parserc.go
Outdated
@@ -890,6 +893,14 @@ func yaml_parser_parse_block_mapping_value(parser *yaml_parser_t, event *yaml_ev | |||
if token.typ == yaml_VALUE_TOKEN { | |||
mark := token.end_mark | |||
skip_token(parser) | |||
// Move foot comment to head comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that's being done below, but I cannot see why this is being done. This comment is a nice place to describe that, as this is not an obvious operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved in 193064d.
@@ -156,6 +156,11 @@ func (p *parser) parse() *Node { | |||
return p.document() | |||
case yaml_STREAM_END_EVENT: | |||
// Happens when attempting to decode an empty buffer. | |||
if toplevel && len(p.event.head_comment) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either the pre-existing comment above is incorrect, or we don't need the new variable. Also, it sounds like we should have enough state available in the parser to tell whether this is a top-level value or not. Can you provide some more details about what's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know / understand the parser well enough to be sure whether the comment is correct or not. That's why I added the parameter toplevel
, to avoid accidentally introducing a bug here. If you prefer I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's okay, but we cannot have additional logic in here unless we understand that well, otherwise we may be surprised in other situations, or may have misleading logic that makes it sound like this code executes at times it doesn't. Right now this is labeled as a "stream end event". Either that's true and the new logic is wrong, or it's incorrect and the old comment is wrong.
decode.go
Outdated
n.Column = 1 | ||
p.event.head_comment = nil | ||
return n | ||
} | ||
n := p.node(DocumentNode, "", "", "") | ||
p.doc = n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 200 above is inside the if, and the following line should be there as well so that the behavior for the call-site remains the same. Positioning the if here would fix both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 193064d.
decode.go
Outdated
if !p.event.implicit && len(p.event.head_comment) > 0 { | ||
// This comment belongs to **before** the document event | ||
n := p.node(DocumentNode, "", "", "") | ||
// (1, 1) is not fully correct, but close enough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please just drop the comment here. Reading this code is clear that the lines are made up, and that's fine because the line is for the DocumentNode, not for the comment. Since the document is empty, there's nothing clearly wrong with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 193064d.
emitterc.go
Outdated
@@ -450,12 +451,16 @@ func yaml_emitter_emit_document_start(emitter *yaml_emitter_t, event *yaml_event | |||
if !yaml_emitter_process_head_comment(emitter) { | |||
return false | |||
} | |||
if !put_break(emitter) { | |||
// Do not add newline if the document is empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just re-stating what we can read nicely in the line below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 193064d.
@@ -0,0 +1,171 @@ | |||
package yaml_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a very nice set of tests for comments in the node_test.go file. These need to be integrated there as well, and the ones unrelated to comments can go into the respective test files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added most of the tests from this file (all single-document tests) to node_test.go as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(02f3190)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but we need to have all tests either in node_test.go or in the respective test files, using existing conventions. We have plenty of tests and test files already.
Also, let's please duplicate the exact same test in two different places. Those that are taken care by node_test.go don't need to exist again elsewhere.
yaml_test.go
Outdated
|
||
|
||
func (s *S) TestCommentEmptyDoc(c *C) { | ||
testIdempotent(c, `# foo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sort of test looks better as "#foo\n", etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 193064d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tuning. I'm providing some additional comments on the PR below, but I also slept thinking of the underlying issue, and this feels like a partial solution to a more general problem.
Please hold on a moment... I'll provide a follow up comment after this review, and then will provide a candidate fix for the problem that maybe addresses the issue more generally.
@@ -156,6 +156,11 @@ func (p *parser) parse() *Node { | |||
return p.document() | |||
case yaml_STREAM_END_EVENT: | |||
// Happens when attempting to decode an empty buffer. | |||
if toplevel && len(p.event.head_comment) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's okay, but we cannot have additional logic in here unless we understand that well, otherwise we may be surprised in other situations, or may have misleading logic that makes it sound like this code executes at times it doesn't. Right now this is labeled as a "stream end event". Either that's true and the new logic is wrong, or it's incorrect and the old comment is wrong.
@@ -359,6 +359,7 @@ func yaml_emitter_emit_stream_start(emitter *yaml_emitter_t, event *yaml_event_t | |||
func yaml_emitter_emit_document_start(emitter *yaml_emitter_t, event *yaml_event_t, first bool) bool { | |||
|
|||
if event.typ == yaml_DOCUMENT_START_EVENT { | |||
isEmpty := emitter.events[emitter.events_head + 1].typ == yaml_DOCUMENT_END_EVENT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know here that the +1 is safe?
@@ -0,0 +1,171 @@ | |||
package yaml_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but we need to have all tests either in node_test.go or in the respective test files, using existing conventions. We have plenty of tests and test files already.
Also, let's please duplicate the exact same test in two different places. Those that are taken care by node_test.go don't need to exist again elsewhere.
// | ||
// c: d | ||
// | ||
// (the newline between the comment and the next line is essential!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is stated more succinctly in the body of the yaml right above, so no need to repeat here.
Otherwise, thanks for the comment. It provides clear rationale for the non-obvious logic.
Okay, so this PR is doing two independent things, so the first thing I'd ask is let's please split those independent things apart, so that we can move both of them forward at their own pace. Issue oneSo, first for the more tricky part of this PR: the current logic considers the first line after an arbitrary node to be a foot comment, as long as there's nothing further below it with no spacing that would justify it being its header. That logic fails when indentation visually associates the comment with the following data instead. Your approach to fix this issue is a viable one: you're transforming what was a footer into a header. While this works, it special cases one particular instance in which we know this can happen, but there might be others that we need to look into. A more general approach is to fix the cause of the issue by not considering such cases as a foot comment in the first place. That's what I've done in 3778e11. Please let me know how you feel about it, and whether it properly addresses cases you have at hand. Issue twoThe second issue regards the handling of empty documents. That looks like you have a mostly clean implementation, except for the handling of the stream end event that is a bit uncertain per the comments in the reviews above. Would you mind to split that apart, and also address the issues raised in terms of testing in the reviews? Please let me know how you feel about these comments. |
When a comment is indented, it may be visually associated as a header of the following line instead of being a footer of the former, even if there is a line break. For example: key: # Comment - Underlying value
I tuned it further to fix @mikefarah's #686 as well. This now works a bit closer to your PR since it looks at the token type, so I'm hoping that no use cases are left behind. I'm going to close this PR on that basis, but please me know if that's incorrect. Also, we still need the empty document part of this PR. Let's please move this to a new PR so we can track that by itself. |
* Add another test (that currently fails). * First shot at using yaml.v3 for reading YAML files with comments. * Allow parsing multi-document YAML files. * Use Decoder to parse multi-part documents. * Use yaml.v3 for config and audit. * First step of serializing YAML using yaml.v3. * Always serialize with yaml.v3. * Remove debug prints. * Remove traces of github.com/mozilla-services/yaml. * Improve serialization of documents consisting only of comments. * Improve handling of some empty documents. * Adjust to latest changes in go-yaml/yaml#684. * Bump yaml.v3 version, temporarily disable failing tests. * Run go mod tidy. * Fix CI.
Fixes two problems with comment handling in
v3
branch found while working on getsops/sops#791:When unmarshaling and re-marshaling
the comment jumps under
c: d
. (The new line is essential, without it it won't happen.)When unmarshaling and re-marshaling
# foo
This parses to a zero node (the comment is lost).
When unmarshaling
we end up with only one document containing both the comment and the hash.
The fix for 2 changes the parsing of "empty" documents which contain comments. Instead of returning
nil
, they now return a document node with no content.I've added some unmarshal + re-marshal tests in
yaml_test.go
.