-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape leading spaces in tag names instead of stripping them #1580
Conversation
@b4n, do you have any real example that "legitimate use of leading spaces in tag names" ? |
main/field.c
Outdated
* pseudo-tags when sorting. Anything with a lower byte value is | ||
* escaped by renderEscapedString() already. */ | ||
unexpected_byte = *s; | ||
vStringPut (b, '\\'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The escape sequence itself is at your discretion, here I merely prefixed with a backslash, but it might make more sense to output \x20
for space and \x21
for an exclamation mark (their hex escape sequence, like we would have \x1f
for Unit Separator (US)).
@masatake Yes: for example JavaScript allows any string as an object property name, like this: var obj = {
' ': function() {}, // one space
'\t': function() {}, // one tab
};
obj[' ']();
obj['\t'](); And I wanted to add tests for this when importing your signature normalization code to make sure it wasn't normalizing something it shouldn't be normalizing. |
(I'll fix the test failure, sorry) |
main/field.c
Outdated
break; | ||
} | ||
else if (c == '\\') | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masatake why no warning when there is a \
? I wasn't emitting a warning for the scope "\\Broken\tContext"
in the self-test as it stopped at the leading \
even though there is a \t
, but I hardly see the point here; should I still make my change behave the same? I would rather think that this was an implementation detail and that emitting the warning in this case is the expected behavior, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ((c > 0x00 && c <= 0x1F) || c == 0x7F)
{
verbose ("Unexpected character (0 < *c && *c < 0x20) included in a tagEntryInfo: %s\n", base);
verbose ("File: %s, Line: %lu, Lang: %s, Kind: %c\n",
tag->inputFileName, tag->lineNumber, getLanguageName(tag->langType), tag->kind->letter);
verbose ("Escape the character\n");
break;
}
else if (c == '\\')
break;
else
continue;
c in ((c > 0x00 && c <= 0x1F) || c == 0x7F)
is not printable, so ctags makes the warning.
Passing such c to the main part from a parser may be a bug. So ctags make the warning.
In other hand \
is a printable. Passing \
to the main part from a parser is valid. There for
ctags doesn't emit a warning. Actually \
is a separator in php. However, ctags must convert \' to
\.
breakin
(c == '\')makes it possible to pass
sincluding
\` to renderEscapedString called at the
end of renderEscapedName.
Do I answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read your code and understand how you mean "implementation detail".
You are correct. Handling the backslash here is just for optimization.
Oh, I see.
or
In these case I think it is allowed to remove the prefixed whitespaces. |
I caught a cold. Please, wait for awhile. |
@@ -0,0 +1,10 @@ | |||
var object = { | |||
'!hello': function(){}, | |||
' hello': function(){}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a key which starts from \
?
Other than the one item about test case, LGTM. What I have to do (1) update format section of ctags.1. (the highest priority), and |
a79ac51
to
f24820b
Compare
Could you add a key which starts from '' ?
Done, plus a few other odd characters. I also fixed the conflicts that
appeared since last time I touched this.
|
@b4n, thank you. LGTM. I think we may have to define version 3 of tags file format. The current readtags command may not work
in tags file but the data structure of readtags doesn't have field for storing tags output mode. Just format version number can be stored to the data structure... |
@b4n Sorry for the delay. I had a look at the patches and they look good to me (the HTML part is trivial). I had a look at the rest as well and makes sense - I'm not so familiar with the ctags file format and don't know what other applications expect there to be but escaping |
I just added a commit updating the format documentation, and a another one (76a2697) with a proposal for a possibly more sensible escaping scheme. The rationale here is that we already have |
REALLY, REALLY, appreciated. Especially about docs/format.rst . Any level of helps are welcome.
I have not wanted to updating We have to think about version 3 and define it.
The most important one is docs/format.rst. |
@masatake I just added a commit updating I'm not sure it's the best way to do it, but it was the easiest without rewriting a lot of code. Please test and comment. BTW, this as said |
Does that mention the escaping anywhere? I couldn't find it at all, so I didn't update it yet. |
This reverts commit 2bcd419.
This is an alternative fix for universal-ctags#1141 instead of stripping which was initially implemented in universal-ctags#1567, but stripping prevents legitimate tag names with whitespaces.
Rebased against master to fix conflicts; no changes. |
Sorry to be late. I still be in busy. But I don't want to be a blocker. I'm introducing What we have done is this project will be explained as version 3 format. @b4n, please look at docs/format.rst ? escaping is explained in the file. As I wrote I will move the When I modified format.rst, I took two approaches. One is putting |
@masatake no worries, take your time.
I know, and I I updated it already (maybe not all of it, I'll check with your last remark), but as you mentioned needing to update the man page I checked it, and couldn't find anything in it. So if it's just for future, all is good. |
This is more in line with the other escaping rules already present than adding 2 new specific and uncommon sequences.
OK, I updated the Exceptions section of format.rst so it's up-to-date. I removed the changes to readtags and plan on submitting them in another PR, because of the testing mess I couldn't yet figure out for Windows. As said earlier, I don't think it's a requirement for this very PR, so sounds reasonable as a separate one (esp. once this one is dealt with, although I could first submit the part that is unrelated to the changes here if you prefer). |
Escape leading spaces in tag names instead of stripping them
Since #1567 leading spaces are stripped from tag names, but this breaks legitimate use of leading spaces in tag names (and actually changed behavior with unproblematic leading characters like
\n
(LF) or\t
(HT)).Instead, escape those characters when rendering instead. It also makes more sense to handle this when rendering as the effective problem is with the output format only.
@techee I'm asking for your review here merely for the HTML parser changes, but they should be straightforward.