Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML: extract keys (with scope) for I18n Ruby Gem #3523

Closed
akemrir opened this issue Oct 31, 2022 · 38 comments · Fixed by #3895
Closed

YAML: extract keys (with scope) for I18n Ruby Gem #3523

akemrir opened this issue Oct 31, 2022 · 38 comments · Fixed by #3895
Milestone

Comments

@akemrir
Copy link

akemrir commented Oct 31, 2022

The name of the parser: Yaml
The command line you used to run ctags:

ctags --options=NONE test.yaml

The content of input file:

en:
  test: 2
  other: 10

The tags output you are not satisfied with:

!_TAG_FILE_FORMAT	2	/extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED	1	/0=unsorted, 1=sorted, 2=foldcase/
!_TAG_OUTPUT_EXCMD	mixed	/number, pattern, mixed, or combineV2/
!_TAG_OUTPUT_FILESEP	slash	/slash or backslash/
!_TAG_OUTPUT_MODE	u-ctags	/u-ctags or e-ctags/
!_TAG_PATTERN_LENGTH_LIMIT	96	/0 for no limit/
!_TAG_PROC_CWD	/home/akemrir/	//
!_TAG_PROGRAM_AUTHOR	Universal Ctags Team	//
!_TAG_PROGRAM_NAME	Universal Ctags	/Derived from Exuberant Ctags/
!_TAG_PROGRAM_URL	https://ctags.io/	/official site/
!_TAG_PROGRAM_VERSION	5.9.0	/p5.9.20220828.0/

The tags output you expect:

...
en.test	test.yaml	/^main(void)$/;"	kind:function	line:2	language:Yaml	signature:(void)	keys
en.other	test.yaml	/^main(void)$/;"	kind:function	line:3	language:Yaml	signature:(void)	keys
...

this could be tricky to omit one level?
...
test	test.yaml	/^main(void)$/;"	kind:function	line:2	language:Yaml	signature:(void)	keys
other	test.yaml	/^main(void)$/;"	kind:function	line:3	language:Yaml	signature:(void)	keys
...

The version of ctags:

$ ctags --version
Universal Ctags 5.9.0(p5.9.20220828.0), Copyright (C) 2015-2022 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Aug 29 2022, 21:05:02
  URL: https://ctags.io/
  Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +json, +interactive, +sandbox, +yaml, +packcc, +optscript, +pcre2

How do you get ctags binary:

Archlinux official repository
extra/ctags 1:5.9.20220828.0-1


I would have a map of keys paths from yaml file. This is for use with I18n ruby gem, for translations.
Then I could for example use it with vim to jump directly to translation and suggest path from vim addon.

Any hints to start off somewhere?
Maybe it would be similar to SASS?

@masatake
Copy link
Member

masatake commented Nov 1, 2022

Any hints to start off somewhere?

u-ctags has a Yaml parser based on libyaml. See https://github.com/universal-ctags/ctags/blob/master/parsers/yaml.c.

I wonder if your ctags executable is linked to libyaml.
Try --list-features option:

$ ./ctags --list-features | grep yaml
yaml              linked with library for parsing yaml input

Even if your ctags executable is linked to libyaml, the yaml parser doesn't extract keys.

You can implement what you want in two ways.
A. developing a subparser specialized to "I18n ruby gem".
https://github.com/universal-ctags/ctags/blob/master/parsers/ansibleplaybook.c and https://github.com/universal-ctags/ctags/blob/master/parsers/openapi.c took the same approach. You can get hints from them.
See also https://docs.ctags.io/en/latest/running-multi-parsers.html#base-sub-parsers .

B. extending the yaml parser to extract keys in a yaml file
You have to study many things in this approach.

I can give you more hints.
The first step is to choose one of two.

@masatake masatake changed the title Yaml key paths YAML: extract keys (with scope) Nov 1, 2022
@akemrir
Copy link
Author

akemrir commented Nov 2, 2022

Thanks for hints. I need to get into topic :)

@akemrir
Copy link
Author

akemrir commented Nov 3, 2022

I think A approach is ok. This is specific functionality to grab full/partial paths, fitting into subparser.

@masatake
Copy link
Member

masatake commented Nov 3, 2022

I think A approach is ok. This is specific functionality to grab full/partial paths, fitting into subparser.

Ok. Could you tell me more about "I18n ruby gem"?
I think I can provide more specific hints.

@akemrir
Copy link
Author

akemrir commented Nov 3, 2022

I18n gem uses yaml files as source of translations.
For example when you have this:

---
en:
  sequel:
    errors:
      or: "or"

Then when general english is used in session, this ruby call will get text from it:

I18n.t('sequel.errors.or') # will get "or"

I'am thinking about approach to this, index from beggining or from certain level or both:

en.sequel.errors.or
sequel.errors.or

Key level indexing like in json is very simple, could be done the regexp way --regex-yaml=REGEX with [\w_]+: for example.
But whole path would be very useful to be more precise with checking what was used.
And I will be able to open in few languages places where key is placed.

@masatake
Copy link
Member

masatake commented Nov 3, 2022

Thank you for your explanation. I found more questions.

A. I think https://github.com/svenfuchs/rails-i18n/blob/master/rails/locale/en.yml is a real example input file. Am I correct? If I'm correct, what string do you want to extract from
.yml file including an array like:

---
en:
  date:
    abbr_day_names:
    - Sun

In this example, an array element is at the leaf. If an array is in the middle of the path like:

---
en:
  syscall:
    - name: read
      arity: 3
    - name: write
      arity: 3

what kind of string do you expect ctags extracts?

C. Other than having .yml as an extension, I think there is no rule for the file name.
Am I correct?

@akemrir
Copy link
Author

akemrir commented Nov 3, 2022

A. Yes, real life example.
I am intrested only in key. To quickly jump with tags in vim where it's placed, so I expect:

en.date.abbr_day_names
date.abbr_day_names

en.syscall
syscall

B. When they are used in code with full form they appear as I18n.t('date.abbr_day_names'), I18n.t('syscall')
Then templating language (erb/haml) or ruby uses them in iteration.

We could also encounter this syntax, so called "Flow" scalars

en:
  key: >
    Your long
    string here.
  command: |
    echo "--- Install gems"
    bundle install

Scalar indicators:
'|' : Block scalar indicator.
'>' : Folded scalar indicator.
'-' : Strip chomp modifier ('|-' or '>-').
'+' : Keep chomp modifier ('|+' or '>+').
1-9 : Explicit indentation modifier ('|1' or '>2').
# Modifiers can be combined ('|2-', '>+1').

So in fact general capture line looks like this:
key: null (before array)
key: a
key: 2
key: "test"
key: 'test'
key: >
key: |
key: |-
etc

This reference card will be useful later: https://yaml.org/refcard.html

C. I agree. User could filter files by using --exclude
I've seen both yaml and yml extensions in use.

Two questions:
D. Could we limit it with parameter? To start gathering from some level without this en: or when needed to allow it.
E. Could we gather both? Full key and only "end" of it? Maybe with use of kind to optionally turn of each?

Thanks for helping me out.

@akemrir
Copy link
Author

akemrir commented Nov 3, 2022

I have one idea, to take into account lines with colon. To get only matching this ^\s?[a-zA-Z0-9_]:

@masatake
Copy link
Member

masatake commented Nov 3, 2022

About A, my understanding is that we can ignore the array (or sequence in Yaml) regardless of the position in a YAML tree structure.

About B I would like to confirm that you want to extract en.key, en.command, key and command from

en:
  key: >
    Your long
    string here.
  command: |
    echo "--- Install gems"
    bundle install

Am I correct?

Two questions:
D. Could we limit it with parameter? To start gathering from some level without this en: or when needed to allow it.
E. Could we gather both? Full key and only "end" of it? Maybe with use of kind to optionally turn of each?

Yes for both questions.

@masatake
Copy link
Member

masatake commented Nov 3, 2022

https://github.com/ruby-i18n/i18n looks like the reference.

@akemrir
Copy link
Author

akemrir commented Nov 3, 2022

A. yes
B. yes
And the last one, also yes

@masatake
Copy link
Member

masatake commented Nov 3, 2022

Thank you.
The last question is the name of parser.
Do you have any idea?
"I18NRubyGem" looks suitable for me.
How do you think?

@akemrir
Copy link
Author

akemrir commented Nov 3, 2022

That's good name.

@masatake
Copy link
Member

masatake commented Nov 3, 2022

Thank you.
I will make a prototype based on this discussion.

We have one critical limitation.
You must enable the "I18NRubyGem" manually when you want to use it.
Any automatic parser detection may not work.
because .yaml and .yml are so generic as a hint for choosing a parser.

In the command line, you must do like "ctags --languages=+I18NRubyGem ...".

@akemrir
Copy link
Author

akemrir commented Nov 4, 2022

ok, no problem
I will use ~/.ctags.d configuration

@masatake
Copy link
Member

masatake commented Dec 9, 2022

I found this need overhauling of yaml.c.

@akemrir
Copy link
Author

akemrir commented Dec 10, 2022

It conflicts with idea of sub-parser? When I would choose this way? Instead of changing yaml.c ?

masatake added a commit to masatake/ctags that referenced this issue Dec 23, 2023
Close universal-ctags#3523

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake masatake mentioned this issue Dec 23, 2023
3 tasks
masatake added a commit to masatake/ctags that referenced this issue Dec 23, 2023
Close universal-ctags#3523

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue Dec 23, 2023
Close universal-ctags#3523

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue Dec 23, 2023
Close universal-ctags#3523

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member

The overhaul was done.
I implemented what I wrote here and made a pull request (#3895).

@masatake masatake added this to the 6.1 milestone Dec 23, 2023
@akemrir
Copy link
Author

akemrir commented Dec 23, 2023

Very well done. I had to compile it on my own machine and it worked almost in all cases.
I can't make it to index key name, the last part of it. I have only full i18n paths in tags file.

---
pl:
  emails:
    title: PanTracker
    footer: Zasilane przez PanTracker
    registration:
      subject: Rejestracja konta
      title: Rejestracja
      message: Twoje konto zostało pomyślnie założone, aktywuj je przy pomocy skrótu poniżej.
      button: Aktywuj
emails.footer	config/locales/emails.pl.yml	/^    footer: Zasilane przez PanTracker$/;"	kind:key	line:5	extras:subparser,domainless
emails.registration.button	config/locales/emails.pl.yml	/^      button: Aktywuj$/;"	kind:key	line:10	extras:subparser,domainless
emails.registration.message	config/locales/emails.pl.yml	/^      message: Twoje konto zostało pomyślnie założone, aktywuj je przy pomocy skrótu poniż/;"	kind:key	line:9	extras:subparser,domainless
emails.registration.subject	config/locales/emails.pl.yml	/^      subject: Rejestracja konta$/;"	kind:key	line:7	extras:subparser,domainless
emails.registration.title	config/locales/emails.pl.yml	/^      title: Rejestracja$/;"	kind:key	line:8	extras:subparser,domainless
emails.title	config/locales/emails.pl.yml	/^    title: PanTracker$/;"	kind:key	line:4	extras:subparser,domainless
pl.emails.footer	config/locales/emails.pl.yml	/^    footer: Zasilane przez PanTracker$/;"	kind:key	line:5	extras:subparser,domainful
pl.emails.registration.button	config/locales/emails.pl.yml	/^      button: Aktywuj$/;"	kind:key	line:10	extras:subparser,domainful
pl.emails.registration.message	config/locales/emails.pl.yml	/^      message: Twoje konto zostało pomyślnie założone, aktywuj je przy pomocy skrótu poniż/;"	kind:key	line:9	extras:subparser,domainful
pl.emails.registration.subject	config/locales/emails.pl.yml	/^      subject: Rejestracja konta$/;"	kind:key	line:7	extras:subparser,domainful
pl.emails.registration.title	config/locales/emails.pl.yml	/^      title: Rejestracja$/;"	kind:key	line:8	extras:subparser,domainful
pl.emails.title	config/locales/emails.pl.yml	/^    title: PanTracker$/;"	kind:key	line:4	extras:subparser,domainful

But I saw it in unit tests, propably I am missing some setting.
Very nice :)

@masatake
Copy link
Member

Very well done. I had to compile it on my own machine and it worked almost in all cases.
I can't make it to index key name, the last part of it. I have only full i18n paths in tags file.

Did you expect ctags emits tags for title, footer, subject, title, message, and button?

But I saw it in unit tests, propably I am missing some setting.

No, you were not. The parser doesn't emit the last components are tags.

After getting your reply, I will update the pull request.

Thank you for your feedback.

@akemrir
Copy link
Author

akemrir commented Dec 23, 2023

Hi, yes

All "tails" would be welcome.
It would be useful when some use shorter version in scope of some context.

masatake added a commit to masatake/ctags that referenced this issue Dec 23, 2023
Close universal-ctags#3523

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue Dec 23, 2023
Close universal-ctags#3523

Designed with @akemrir.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake masatake changed the title YAML: extract keys (with scope) YAML: extract keys (with scope) for I18n Ruby Gem Dec 23, 2023
@akemrir
Copy link
Author

akemrir commented Dec 23, 2023

Just checked newest changes. Very nice and explanatory documentation.
Could you append info about this?

--langmap=I18nRubyGem:.yml

This worked for me without force. It may be useful for others? What do you think?

@masatake
Copy link
Member

--langmap=I18nRubyGem:.yml

That is understandable, but this puts other YAML-based parsers into chaos.

This is the reason I confirmed #3523 (comment) .

I'm wrong in English. So, if you know good sentences that balance between the limitations, avoiding chaos and usability, could you write them down here? I will merge them into my pull request.

@akemrir
Copy link
Author

akemrir commented Dec 24, 2023

No, it's good. But it needs more detail. Not everyone will catchup like we do.
So in general to make it usable, user would need to something like this.

ctags --sort=yes -f tags
ctags --sort=yes --extras=+q --language-force=I18nRubyGem --languages=+I18nRubyGem --fields=+E --exclude=tags -a -f tags

First one would do generic preparation, second will append to tags file.

I am missing something from manual? About general usage?

Secondly, if langmap for one yaml based parser is bad, should I do it this way?
I feel that I am missing something.

--langmap=+I18nRubyGem:.yml
--langmap=+Yaml:.yml
--langmap=+AnsiblePlaybook:.yml

@masatake
Copy link
Member

Writing about -a option in the man page looks helpful.
I will add sentences to the man page.

Secondly, if langmap for one yaml based parser is bad, should I do it this way?
I feel that I am missing something.

--langmap=+I18nRubyGem:.yml
--langmap=+Yaml:.yml
--langmap=+AnsiblePlaybook:.yml

These don't work at all. If they work, ctags may have a bug.

@akemrir
Copy link
Author

akemrir commented Dec 25, 2023

Is there possibility to enable all parsers without specifying single?

@masatake
Copy link
Member

Is there possibility to enable all parsers without specifying single?

There is no possibility.
If you think the limitation is critical. I will withdraw #3895.
I recognized this was a critical limitation. #3523 (comment)

YAML files using .yaml or .yml as file extensions are not self-descriptive. There is no heuristic for recognizing whether a .yaml file is for I18nRubyGem or not.

@masatake
Copy link
Member

Unlike XML, YAML is not self-descriptive. Only with the user's intervention, ctags can only do something useful for such input.

@akemrir
Copy link
Author

akemrir commented Dec 25, 2023

ok, thanks for your time and explanation
very nice addition 👍

@masatake masatake reopened this Dec 25, 2023
@masatake
Copy link
Member

I found an excellent heuristic; a YAML file may be an I18nRubyGem file if the top-level entries are locale names.

@akemrir
Copy link
Author

akemrir commented Dec 25, 2023

That sounds good. You mean ISO 3166-1 alpha-2?
I would like to remind that they could have shape of nn-NO, zh-CN for example.
They look like IETF language tag from https://www.venea.net/web/culture_code

Can you validate locale names in base code this way?

So more like \w{2,}(-\w{2,})?:

@akemrir
Copy link
Author

akemrir commented Dec 25, 2023

I have found also this form ff-Latn-SN, but haven't seen it in real applications.

masatake added a commit to masatake/ctags that referenced this issue Dec 25, 2023
Close universal-ctags#3523

Designed with @akemrir.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member

I implemented the heuristic. You don't have to specify --language-force=I18nRubyGem --languages=+I18nRubyGem to use the parser.

I got a local list from

		/* For generating this list, I did:
		 *
		 *    ls /usr/share/locale/ | xargs -n 1 printf '"%s",\n'
		 *
		 *  on Fedora 39.

masatake added a commit to masatake/ctags that referenced this issue Dec 25, 2023
Close universal-ctags#3523

Designed with @akemrir.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@akemrir
Copy link
Author

akemrir commented Dec 26, 2023

hmm, ok
But they have underscore instead of dash?

maybe?

ls /usr/share/locale/  | sed 's/_/-/g' | xargs -n 1 printf '"%s",\n'

Or both. What do you think?
These two variants with and without sed substition merged has 442 entries.
Without second variant 360, so not that much higher value.

@masatake
Copy link
Member

Thank you. I will use

		 *	{ ls /usr/share/locale/ | xargs -n 1 printf '"%s",\n';
		 *	  ls /usr/share/locale/ | sed 's/_/-/g' | xargs -n 1 printf '"%s",\n' } \
		 *	  | sort | uniq; echo 'NULL'

masatake added a commit to masatake/ctags that referenced this issue Dec 26, 2023
Close universal-ctags#3523

Designed with @akemrir.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@akemrir
Copy link
Author

akemrir commented Dec 26, 2023

Works very well :)

After this command:

ctags --options=NONE config/locales/*

I can use tags as it was with -a
obraz

@masatake
Copy link
Member

Looks nice.

masatake added a commit to masatake/ctags that referenced this issue Dec 26, 2023
Close universal-ctags#3523

Designed with Karol Jakusz-Gostomski <Karol Jakusz-Gostomski>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@akemrir
Copy link
Author

akemrir commented Dec 26, 2023

Ok, I have checked all three variants.

When I move cursor to the translation key, I can jump directly to translation in yaml file.
obraz

In case of "shortcut"
obraz

obraz

And for very short
obraz

obraz

masatake added a commit to masatake/ctags that referenced this issue Dec 27, 2023
Close universal-ctags#3523

Designed with Karol Jakusz-Gostomski <Karol Jakusz-Gostomski>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit that referenced this issue Dec 27, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants