-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Update readtags to support escape sequences and fix & improve its testing #1605
Conversation
Follow Automake's verbosity instead of unconditionally silencing the rules.
* Don't perform shell expansion on tag names. * Actually check whether readtags found a tag matching. As readtags exits successfully even without matches, make sure it found at least one match for assuming success. * Code cleanup.
* Properly handle all input, not just the first line; * Un-escape also the low 0x01 - 0x1F and 0x7F bytes.
This restores backward compatibility with e.g. backslashes in the search string.
0ac3a99
to
f7ade0c
Compare
* don't stop all processing when one input line has no escaping * don't stop expansion after an unhandled escape sequence
Actually just revert it.
OK I don't know how to avoid the issue yet, e.g. |
5d05ee6
to
341db19
Compare
OK I improved the situation with the last commit after finding https://github.com/msys2/msys2/wiki/Porting#user-content-filesystem-namespaces, but it's still not perfect: there's encoding issues, and some exclude patterns break legitimate stuff (like I don't really know what I can do, but add the ability for readtags to read the tag name from a file instead of the command line. It's not very nice because I can't really see a real use for it, but it would solve the whole problem at once I guess. |
(I will work on this item in the new year day.) |
…put-format=u-ctags|e-ctags option To record meta characters like tab in a name, we have designed new file format. The file format itself was already introduced. TAG_OUTPUT_MODE pseudo tag was used as marker telling which format is used in a tag file: Exuberant-ctags(e-ctags) compatible format or Universal-ctags(u-ctags) file format. u-ctags was the default format. A user could choose the format with --output-format option. However, we recognize that the hack using TAG_OUTPUT_MODE pseudo doesn't work well with readtags command. A data structure of readtags doesn't have a field for recording TAG_OUTPUT_MODE. It means readtags cannot switch the file format version dynamically. The pseudo tags can be used in readtags for the purpose, recognizing the file format is only TAG_FILE_FORMAT. This commit introduces following SELF INCOMPATIBLE changes: 1. --output-format=u-ctags and --output-format=e-ctags are removed. They are unified to --output-format=ctags. 2. TAG_OUTPUT_MODE pseudo tag is removed. Instead TAG_FILE_FORMAT=3 is introduced. TAG_FILE_FORMAT=2 is compatible with e-ctags. TAG_FILE_FORMAT=3 is default. A user can choose one of the formats with --format=3 or --format=2. The new version 3 may break many client tools. If you are just user of such client tools, use --format=2 option to generate tags file with compatible file format. If you are a developer of such client tools, please, consider to support --format=3. The rest of work are: 1. more Tmain test cases, 2. more precise documentation about the format 3. 3. updating readtags to support format 3. About readtags, the most of all work is done by @b4n. I must merge his changes(universal-ctags#1605). Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…put-format=u-ctags|e-ctags option To record meta characters like tab in a name, we have designed new file format. The file format itself was already introduced. TAG_OUTPUT_MODE pseudo tag was used as marker telling which format is used in a tag file: Exuberant-ctags(e-ctags) compatible format or Universal-ctags(u-ctags) file format. u-ctags was the default format. A user could choose the format with --output-format option. However, we recognize that the hack using TAG_OUTPUT_MODE pseudo doesn't work well with readtags command. A data structure of readtags doesn't have a field for recording TAG_OUTPUT_MODE. It means readtags cannot switch the file format version dynamically. The pseudo tags can be used in readtags for the purpose, recognizing the file format is only TAG_FILE_FORMAT. This commit introduces following SELF INCOMPATIBLE changes: 1. --output-format=u-ctags and --output-format=e-ctags are removed. They are unified to --output-format=ctags. 2. TAG_OUTPUT_MODE pseudo tag is removed. Instead TAG_FILE_FORMAT=3 is introduced. TAG_FILE_FORMAT=2 is compatible with e-ctags. TAG_FILE_FORMAT=3 is default. A user can choose one of the formats with --format=3 or --format=2. The new version 3 may break many client tools. If you are just user of such client tools, use --format=2 option to generate tags file with compatible file format. If you are a developer of such client tools, please, consider to support --format=3. The rest of work are: 1. more Tmain test cases, 2. more precise documentation about the format 3. 3. updating readtags to support format 3. About readtags, the most of all work is done by @b4n. I must merge his changes(universal-ctags#1605). Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…put-format=u-ctags|e-ctags option To record meta characters like tab in a name, we have designed new file format. The file format itself was already introduced. TAG_OUTPUT_MODE pseudo tag was used as marker telling which format is used in a tag file: Exuberant-ctags(e-ctags) compatible format or Universal-ctags(u-ctags) file format. u-ctags was the default format. A user could choose the format with --output-format option. However, we recognize that the hack using TAG_OUTPUT_MODE pseudo doesn't work well with readtags command. A data structure of readtags doesn't have a field for recording TAG_OUTPUT_MODE. It means readtags cannot switch the file format version dynamically. The pseudo tags can be used in readtags for the purpose, recognizing the file format is only TAG_FILE_FORMAT. This commit introduces following SELF INCOMPATIBLE changes: 1. --output-format=u-ctags and --output-format=e-ctags are removed. They are unified to --output-format=ctags. 2. TAG_OUTPUT_MODE pseudo tag is removed. Instead TAG_FILE_FORMAT=3 is introduced. TAG_FILE_FORMAT=2 is compatible with e-ctags. TAG_FILE_FORMAT=3 is default. A user can choose one of the formats with --format=3 or --format=2. The new "version 3" may break many client tools. If you are just user of such client tools, use --format=2 option to generate tags file with compatible file format. If you are a developer of such client tools, please, consider to support --format=3. The rest of work are: 1. more Tmain test cases, 2. more precise documentation about the format 3. 3. updating readtags to support format 3. About readtags, the most of all work is done by @b4n. I must merge his changes(universal-ctags#1605). docs/news.rst are rewritten by @b4n and @codebrainz. Ussing stringfy for printing default version of file format in --help message is suggested by @b4n. Signed-off-by: Masatake YAMATO <yamato@redhat.com>
I will take time for this important one. |
@b4n, I'm reading your patches. |
misc/roundtrip
Outdated
@@ -112,8 +114,10 @@ OLD_IFS="$IFS" | |||
IFS=' | |||
' | |||
set -x | |||
for tags in $(find "$UNITS" -name expected.tags); do | |||
for name in $(sed -e 's/^\([^ ]*\) .*/\1/' "$tags"); do | |||
tagfiles="$(find "$UNITS" -name expected.tags)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first and the last double quote characters are needed.
Does the following code do the same?
tagfiles=$(find "$UNITS" -name expected.tags)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, doesn't matter
misc/roundtrip
Outdated
for name in $(sed -e 's/^\([^ ]*\) .*/\1/' "$tags"); do | ||
tagfiles="$(find "$UNITS" -name expected.tags)" | ||
for tags in $tagfiles; do | ||
tagnames="$(sed -e 's/^\([^ ]*\) .*/\1/' "$tags")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as the last my comment.
I will merge following changes to master branch.
|
misc/roundtrip
Outdated
@@ -53,7 +53,7 @@ expandEscapeSequences() | |||
:again | |||
s/\\/__BACKSLASH__/ | |||
T out | |||
s/__BACKSLASH__\\/__LASTBACKSLASH__/; t again | |||
s/__BACKSLASH__\\/__LITBACKSLASH__/; t again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you tell me why this change is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed per se, but I changed the name for clarity as it's not only used for the last backslash but for all literal backslashes
@b4n, my understanding is that many changes are about \n and \t. |
Not really, having fixups here is just that I'm too used to GNU sed so I don't think about its extensions in supporting escape sequences; but ultimately the changes are the same just with a slightly different syntax (using the literal value rather than the GNU sed escape sequence). |
This is privte note for reviewing task. I have been thinking about code for comparison. My testing:
A tab character is put between Let's see if a whitespace is placed instead of the tab char.
e-ctags violates the v2 format. u-ctags --format=2 violates the format. u-ctags (version 3) violates the format. I guess none seriously follows v2 format. Instead, we should refine v3 format as possible and make a reliable format. o.k. About the tags generator side, I understand where I should go. How about reader side? I assume the original readtags works well with the output of e-ctags though the generator side doesn't follow the v2 format spec:-P. So I think the original code of readtags code for handling e-ctags output as is. My idea is that the code introduced by @b4n's patch should be activated for handling u-ctags v2 and v3 formats output only. The good new is that tagFileInfo and TagFile data structures provide enough information to detect whether e-ctags, u-ctags v2 or u-ctags v3 generates the current output. In other word, using strcmp for e-ctags and tagcmp for u-ctags is my prefer. Keeping the strcmp based code makes the source code (and test cases) complicated a bit. About v2 format. format.rst which explains v2 tags format says:
However, as above output shows, e-ctags doesn't follow the v2 tags format at least using regex parser. |
Comparing the output is the part of library. So I am conservative. However, about command line interface we can introduce changes, as we did in ctags command itself. The question is whether we should print the result in escaped way or un'escaped way? I will find more time for this issue. |
As said, I think that my changes are backward compatible. And I think it is not only with the v2 format, but also the e-ctags format: the only thing that would risk being a problem is if e-ctags could output unescaped |
Thank you very much. I merged all your changes. |
Following what I started in #1580, this PR aims at updating readtags to properly support the current format, as well as fixing & improving its automated test cases (
roundtrip
).