Fix handling of comments in import groups #11

nick-jones · 2020-05-23T23:17:04Z

My apologies, I only noticed #10 introduced a slight regression after it was merged. Hopefully the commit message explains this well enough:

I noticed #10 introduced a regression in that a comments sitting on
a separate line are now ignored - leading to the `importInfo` "value"
being set to an empty string. This trips things up further along as
empty string implies a separate group. I've tried to address this
by retaining "value" as the raw value (renamed to `lineValue` for
clarity), and adding a new field to `importInfo` which contains the
import path, if valid for the given line.

I noticed pavius#10 introduced a regression in that a comments sitting on a separate line are now ignored - leading to the `importInfo` "value" being set to an empty string. This trips things up further along as empty string implies a separate group. I've tried to address this by retaining "value" as the raw value (renamed to `lineValue` for clarity), and adding a new field to `importInfo` which contains the import path, if valid for the given line.

I mentioned in pavius#10 that there are still potential gotchas by extracting import info line-by-line. This attempts to address that along switching to an even more reliable way of reading import information. Given we're already parsing the the file for import line numbers, I explored the idea of leaning on this information even more. As it turns out this was viable - both comments and import path strings can be obtained from the parsing output. This PR switches to using information extacted solely via the parser, instead of the existing approach of line-by-line evaluation with some information supplemented by the parser. Switching to the parser opened up some different possibilities. Most notably it is now possible to deal with comments in a slightly better way; instead of trying to pay attention but also ignore these at the same time, the comment is handled in the same way the parser deals with them; it is associated with the ajoining import line. This also addresses one of the gotchas mentioned before: multi-line comments. Assuming the multi-line comment is immediately ajoining, this is now treated no different to single-line comments. Whilst visually these comments may make the imports appear to be grouped separately, semantically there is no difference. So IMO they should be treated the same. In switching to this one other potential gotcha came to mind - should comments be permitted that are standalone (i.e. unattached from any import). Historically this tool has been sort of allowing for that (certain bugs may have existed priot to switching to go/scanner). This change allows for it, and I've covered that in the test case targetting comment peculiarities. Beyond that, I addressed one gotcha that wasn't discussed before: it is entirely legal to have multiple import declarations. This is one point where this tool likely has to become opinionated - should mutliple be permitted? Well, IMO, no (with the exception of the CGO `import "C"` declaration). The existing implementation wouldn't take issue with this, and when parsing line by line it would end up evaluating lines outside of declarations (I think this is limited to comments, LPAREN and RPAREN - which likely wouldn't trip things up after the switch to go/scanner). The verifier now returns an error if there are multiple declarations (`import "C"` excluded). Finally, this addresses one final issue that pavius#11 didn't cover - an improperly formatted file could have unnecessary whitespace between groups. Trimming the value of whitespace could have addressed this, but we get a fix for free from this; the whitespace becomes irrelevant.

I mentioned in pavius#10 that there are still potential gotchas by extracting import info line-by-line. This attempts to address that along switching to an even more reliable way of reading import information. Given we're already parsing the the file for import line numbers, I explored the idea of leaning on this information even more. As it turns out this was viable - both comments and import path strings can be obtained from the parsing output. This PR switches to using information extracted solely via the parser, instead of the existing approach of line-by-line evaluation with some information supplemented by the parser. Switching to the parser opened up some different possibilities. Most notably it is now possible to deal with comments in a slightly better way; instead of trying to pay attention but also ignore these at the same time, the comment is handled in the same way the parser deals with them; it is associated with the ajoining import line. This also addresses one of the gotchas mentioned before: multi-line comments. Assuming the multi-line comment is immediately ajoining, this is now treated no different to single-line comments. Whilst visually these comments may make the imports appear to be grouped separately, semantically there is no difference. So IMO they should be treated the same. In switching to this one other potential gotcha came to mind - should comments be permitted that are standalone (i.e. unattached from any import). Historically this tool has been sort of allowing for that (certain bugs may have existed prior to switching to go/scanner). This change allows for it, and I've covered that in the test case targeting comment peculiarities. Beyond that, I addressed one gotcha that wasn't discussed before: it is entirely legal to have multiple import declarations. This is one point where this tool likely has to become opinionated - should multiple be permitted? Well, IMO, no (with the exception of the CGO `import "C"` declaration). The existing implementation wouldn't take issue with this, and when parsing line by line it would end up evaluating lines outside of declarations (I think this is limited to comments, LPAREN and RPAREN - which likely wouldn't trip things up after the switch to go/scanner). The verifier now returns an error if there are multiple declarations (`import "C"` excluded). Finally, this addresses one final issue that pavius#11 didn't cover - an improperly formatted file could have unnecessary whitespace between groups. Trimming the value of whitespace could have addressed this, but we get a fix for free from this; the whitespace becomes irrelevant.

I mentioned in pavius#10 that there are still potential gotchas by extracting import info line-by-line. This attempts to address that along switching to an even more reliable way of reading import information. Given we're already parsing the the file for import line numbers, I explored the idea of leaning on this information even more. As it turns out this was viable - both comments and import path strings can be obtained from the parsing output. This PR switches to using information extracted solely via the parser, instead of the existing approach of line-by-line evaluation with some information supplemented by the parser. Switching to the parser opened up some different possibilities. Most notably it is now possible to deal with comments in a slightly better way; instead of trying to pay attention but also ignore these at the same time, the comment is handled in the same way the parser deals with them; it is associated with the adjoining import line. This also addresses one of the gotchas mentioned before: multi-line comments. Assuming the multi-line comment is immediately adjoining, this is now treated no different to single-line comments. Whilst visually these comments may make the imports appear to be grouped separately, semantically there is no difference. So IMO they should be treated the same. In switching to this one other potential gotcha came to mind - should comments be permitted that are standalone (i.e. unattached from any import). Historically this tool has been sort of allowing for that (certain bugs may have existed prior to switching to go/scanner). This change allows for it, and I've covered that in the test case targeting comment peculiarities. Beyond that, I addressed one gotcha that wasn't discussed before: it is entirely legal to have multiple import declarations. This is one point where this tool likely has to become opinionated - should multiple be permitted? Well, IMO, no (with the exception of the CGO `import "C"` declaration). The existing implementation wouldn't take issue with this, and when parsing line by line it would end up evaluating lines outside of declarations (I think this is limited to comments, LPAREN and RPAREN - which likely wouldn't trip things up after the switch to go/scanner). The verifier now returns an error if there are multiple declarations (`import "C"` excluded). Finally, this addresses one final issue that pavius#11 didn't cover - an improperly formatted file could have unnecessary whitespace between groups. Trimming the value of whitespace could have addressed this, but we get a fix for free from this; the whitespace becomes irrelevant.

nick-jones force-pushed the groups-with-comments branch from 4f1776d to 7376751 Compare May 23, 2020 23:26

pavius merged commit 3cdbde3 into pavius:master May 24, 2020

nick-jones mentioned this pull request May 24, 2020

Rework parsing of imports #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of comments in import groups #11

Fix handling of comments in import groups #11

nick-jones commented May 23, 2020

Fix handling of comments in import groups #11

Fix handling of comments in import groups #11

Conversation

nick-jones commented May 23, 2020