-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantic comments #16
Comments
Some more thoughts on how this relates to #7: we could introduce our custom @-tags for language-specific meta information which would be ignored by tools like compare-locales:
|
We could also introduce versioning for messages . This would allow making small changes to the original copy without having to change the identifier. The default and implicit revision would be revision 0:
and some time later:
|
In #7 (comment) @Pike suggested that we separate semantic comments and grammatical data due to their having different owners. |
I love that. One possible use case is to define maximum string length (we already use it in .lang files, e.g. for translating promotional tweets). |
I'll bet you 100€ that some developer will forget to version a string and that we'll end up with significant changes in the source language that will be missing from the target languages - while you might still control this in QA somehow for Mozilla's core products, you can't count on every extension developer to be aware of the problem. The only way that works reliably in my book is the way gettext handles this: If the string content has changed, it will become fuzzy. It is annoying for localizers if strings become invalid due to a typo having been fixed in the source language, but it's the lesser of 2 evils. |
In #59 @zbraniecki asked about the possibility of using semantic comments for tags. I'm pasting my reply below to keep the discussion in one place. My understanding of the scope of semantic comments is that they would be a place to put extra data available to tools rather than the runtime. In fact they wouldn't be parsed on runtime at all. This would make it impossible to use them for tags. |
Even before we move forward with this, can we get a consensus on how we'd like such comments to look like so that we can start writing such comments even without them being "semantic" yet? I'm trying to decide between:
and:
or sth in between? Thoughts? |
Isn't this the exact goal of this issue? :) Some prior art: JSDoc: # @param {number} $num - The $num value. Some Python projects use this style: # :param $num: The $num value.
# :type $num: number I like simple ideas of the form: # $num (Number) - Description But without a # $num (Number) Description
# @max-length 140 |
If the goal is to parse comments, and extract information about parameters, I feel like we should enforce |
One good use of semantic comments would be to instruct the localization tool like Pontoon on what context the string will be used in. The particular case is where a message like this:
The latter Semantic comments could make it trivial:
and to prevent having to place it in front of every string, we could use group comments and resource comments to annotate the whole file. |
Semantic Comments v1 proposal(updated: April 5 2018) DescriptionSemantic comments is the concept which brings basic computer readable structure to comments. The idea is to design a set of patterns that can be codified which enable algorithmic interpretation of a comment. The core design goal is to develop rules that are easy to naively interpret and memorize by humans with minimal overhead, while at the same time allowing computers to assign meaning. Semantic comments may serve several high level roles:
In principle, the nature of the data stored in the comments is limited - runtime parsers should be able to skip comments without parsing them and failure to retrieve information from the comment should not result in any serious reduction in usability of the system. Experience from other programming languages shows that some form of semantic comments are helpful in most languages from JavaScript, Python, C++ to Rust and CSS. Below is my initial proposal for the first version of Semantic Comments. Title LineIt would be useful to be able to capture a short description of the section for UI tools to use when operating on long lists of strings. A great example of such use case is the current preferences.ftl with hundreds of messages clustered into sections. My proposal is to identify the title line of any comment as fitting into one of two conditions:
That means that the following two are titles lines:
And the result in Pontoon, for example, may look like this Meta-infrormationMeta information by the definition should be an open ended system. It means that while we can specify the syntax around it and define a set of values that are defined and known, this system should also be open to be extended in the future with new keys. For that reason, I believe that a key-value param system would work well. The initial uses of meta information may provide details like: in which context the message is being used? Communication style to use for such message. Are there any legal requirements associated with it (branding policy etc.), what UI toolkit it will use etc, string version, etc. Another use case is to instruct the tooling about any soft-limitations imposed on the translation. For example we may want to instruct the localization tool that a given file/group/message should remain JSDoc has a nice system of block and inline tags. Copying it, it may look like this:
Since the system is open ended, the only first step is to define the syntax for it. I'd like it to be:
with all three being optional. VariablesVariables could either be a particular type of block tags, or a separate thing:
or:
I'm not very opinionated here, and we could start with the former and maybe one day add the latter as a convenience mechanism ("($.*)" becomes a "@arg $1"). Syntax coloring / validationThe last item I'd like us to consider is syntax coloring and augumentation. There are three areas where we may end up placing a syntax from another programming language:
For the DOM Overlay, I believe that block tag For arguments it's a bit more tricky, and I thought we could annotate it like this:
to allow us to specify that the Finally, in the comment, I really like the RST way:
This could be introduced gradually - we could now specify "`" as the sygil for code only, and let tooling autoguess the syntax highlighting for it, and one day extend it with The total outcome might look like this |
Nice work, @zbraniecki! This is very much in line with what I had in mind for this, thanks for adding substance and providing details.
I see your point and I like the Pontoon mockup. I'm not sure this practice needs to be codified as a rule. Pontoon could simply show the first line of the comment truncated to the fit the UI and the effect would be the same, I think?
This looks great and using
I like basing the syntax on JSDoc. One thing that I didn't see in your proposal is the syntax for example values. In my original comment I used JSDoc's syntax for default values of optional params:
I'm not sure this would be a good fit for Fluent. There is no notion of optional parameters/arguments so the braces
Which would give us:
An alternative inspired by TypeScript, Rust and a few others:
IIUC any such derivation will make our comments syntax incompatible with JSDoc. Should we try to maintain the compatibility? Or is that a non-goal?
I recommend sticking to Markdown rather than adding features from RST. As such, I think comment contents should simply be allowed to be valid Markdown. This would make it possible to use backticks for inline code fragments, without any syntax highlighting ( |
Title LineI don't see a great advantage in having a title for section comments. In case, I would prefer something more explicit than relying on position and empty lines, i.e.
Which would make it fall into the next group. Meta-informationI'm trying to imagine how we could practically use this information, but I'm failing. For example, for us I think we need some valid use cases to justify the added complexity of parsing these comments. VariablesI agree that we should standardize this type of information, and I'm fine with the We could even go as far as failing some tests if a string has placeables but not associated comments. Syntax coloring / validationI don't think there's value in highlighting syntax in comments (last part of the proposal). It adds a ton of complexity for little gain. I'm not sure if there should be highlighting in strings either, but I'd be more open about that.
I think this should be something more like
Which could be used to both validate the attribute externally (compare-locales), and highlight strings in Pontoon. |
I think there is a value.
non-goal
I agree about backticks, but I'd be concerned if we tried to say that all markdown syntax is supported in our comments. AFAIK Markdown supports much more and tying us to markdown seems a bit excessive (and adds a strong dependency). |
I'm not opposed to using
If I read your statement correctly it starts with "I don't understand" and finishes with "and thus I believe the proposal is invalid" :) I'm happy to answer your questions and explain further, but I do believe the example listed are valid.
Hmm, how would you denote the syntax of the value then? |
Uhm, where did I say “I don't understand”? You gave a few examples:
I'm not against them, in fact I suggested to use
On the same subject, I see these should apply only to individual strings, not file wide. |
What's the vale that you're seeing? :) In particular, what is the value over what I suggested:
You're right, we should be explicit about only supporting a strict subset of Markdown. |
I think we should split this up. This is way too big to reason about at this point. High-level comments:
Suggestions: I'd recommend to have this issue focus on the |
I've recently seen several conversations on #developers indicating that this is no longer true. I'd like to verify that so I'll seek further confirmation, but in general, it's a per-project policy and a header like that may be useful. Please remember that we're designing syntax not just for Gecko.
That's not always true. We have a lot of branding related policies in other files and assuming that all brands will end up in separate FTL files is IMHO not going to hold. Having a parameter to provide policy information seems like a low hanging fruit. Regarding For example, while currently a comment may contain contextual information, it would be hard/impossible for Pontoon to try to reason about if such comment contain any contextual information and which part of the comment does so. Examples may be screenshots of the UI, or even more semantic information like is it a title, message, button label etc, which could be further used by the tool to improve the graphical representation of the message and help the localizer understand how to translate. A particular example here is that knowing the context of the message may help Pontoon prioritize the messages in translation memory which share the same context over ones that have the same English value but different context. Those are of course just example.
Agree.
Agree. I'll file issue per proposal assuming that we're past the stage where a single issue for all elements of the proposal make it easier to discuss them. Thanks! |
I think that's one point that I tend to forget in such discussions. And translation memory seems definitely an interesting application, the challenge would be making sure values for these are chosen consistently. |
Based on conversation with Stas I added an example for meta-data about |
We talked about moduralization of fluent specs, and I think semantic comments would be a good example. Should we have a repo for just semantic comments ( |
I'd prefer to keep everything in a single repo and use labels and projects. We can add a new file in the |
Separated out into issues. Skipped colors for now. |
I created a GitHub project for tracking the design and implementation of semantic comments: https://github.com/projectfluent/fluent/projects/5. @zbraniecki, should we close this issue given that we now have separate issues for each proposal? |
Having a way to semantically describe a message would benefit tooling. It would allow tools to better inform the user what they can do in the translation, and give hints and suggestions.
Perhaps we could consider using something similar JSDoc. In particular the
@param
tag: http://usejsdoc.org/tags-param.html. JSDoc conveniently allows to specify the type, the description and the default value, which could be used by tools to display an example of a formatted translation.Would it make sense to make this meta-information first-class? Rust differentiates between regular comments (
//
) and doc comments (///
). We could do something similar by making the@
sigil special:This is possibly related to #7.
The text was updated successfully, but these errors were encountered: