-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internationalization of free-form text #460
Internationalization of free-form text #460
Conversation
9a827d6
to
c737d70
Compare
They might be represented in a different writing system, eg Latin and Cyrillic scrips. |
c737d70
to
661eeab
Compare
I hereby call a vote on this proposal. Voting will be open for 10 full calendar days until 11:59PM UTC on Friday, January 20th. Please vote for or against the proposal, and include the organization for which you are voting in your comment. Please note if you can commit to implementing the proposal. |
+1 from Transit, thanks @testower |
+1 from Entur (not sure my vote counts since its my proposal) |
gbfs.md
Outdated
* `https://www.example.com/data/en/system_information.json` | ||
* `https://www.example.com/data/fr/system_information.json` | ||
* Each supported language MUST be listed in the `languages` field in `system_information.json`. | ||
* Translations SHOULD be provided for each supported language for all translateable fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because supported languages are a MUST but individual translations are a SHOULD, do you have any thoughts on preferred behavior when a string translation is missing for a particular language? Similarly, should data be considered invalid if a translation is provided for an unsupported language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This should be a MUST as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Similarly, should data be considered invalid if a translation is provided for an unsupported language?" not sure if that is necessary - what purpose would it serve to specify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose would be to make it explicit that there should be 1:1 mapping between values in the languages
field in system_information.json
and individual localized strings in translatable fields. I think changing this to MUST helps, but I'd go further (see comment below)
`languages` | Yes | Array | List of languages used in translated strings. Each element in the list must be of type Language. | ||
`name` | Yes | Array | Name of the system to be displayed to customers. An array with one object per supported language with the following keys: | ||
\- `text` | Yes | String | The translated text | ||
\- `language` | Yes | Language | IETF BCP 47 language code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is kind of a bummer that you have to repeat the same text + language snippet for every translated field. In the spirit of DRY, would you consider introduced an Array<Localized String>
type (or something equivalent) and defined the fields of Localized String
above in the "Field types" section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we went a few rounds on this but ended up going with what seems to be the overall style of the spec. This could always be changed without it being breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to recommend that you also update language
description to say:
An IETF BCP 47 language code that MUST match one of values specified in the
languages
field insystem_information.json
.
However, it seems silly to repeatedly duplicate that guidance for every translatable field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree. I have played around with some varieties of this but all felt clunky to me and kind of collided with the style of the rest of the spec. However, if you have a concrete proposal on how this would look would you mind drafting it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. the notation Array<Type>
is not used elsewhere in the spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example of what the syntax could look like:
https://github.com/MobilityData/gbfs/compare/master...bdferris-v2:gbfs:localized-string?expand=1
https://github.com/bdferris-v2/gbfs/blob/2b98e0280279ce705de746a3366ebf6795209dfc/gbfs.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against this, but it will take a while to update the proposal - and the vote may have to be postponed.
@mplsmitch what do you think about this style? It certainly makes things much less verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand - changing this separately will not be breaking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less verbose is better but maybe we should try to pass this as-is, and then open a separate vote on changing to the localized string format. It seems like it would be an easy sell with some example JSON showing how much lighter it would be.
661eeab
to
d7dcc63
Compare
I have updated the PR with two changes of SHOULD to MUST in the Localization section, as well as corrected a few examples that I missed previously. |
Voting on this PR closes in 2 calendar days. Please vote for or against the proposal, and include the organization for which you are voting in your comment. Please note if you can commit to implementing the proposal. |
It would be a shame to make a new major version without this change in, as it would have to wait for a v4 then, and the state of internationalization now is not great. Can the voting period be extended until we are all happy with this proposal? |
We discussed this proposal at nextbike by TIER and came up with the following pros and cons:
In general we don’t see that the advantages are big enough for a breaking change. However we would implement the new standard. So no vote / abstentation on this proposal. |
@testower as the advocate you can choose to cancel the vote and continue the discussion until you feel the proposal is in a good place and consensus has been reached. Once at that point you can re-open the vote. There have been changes pending for v3 for some time now, waiting an extra week or so for this to be resolved will not be a problem. Additionally it would allow us to open the vote on 474 for v3 as well. |
+1 from Google All my word-smithing aside, I think this is a more natural data representation that avoids duplicating the same semantic data cross multiple localized feeds. To the comments from @rootsixtysix, I think dot-operator syntax would still work for a valid feed. Namely, if a string field was already a required field, then there must be a translation specified (otherwise field is invalid), so you could safely reference the first element for a single-language feed. For multiple language feeds, maybe we could add a "The primary language of the feed should be specified first"? |
`opening_hours`<br/>*(added in v3.0-RC)*| Yes | String | Hours and dates of operation for the system in [OSM opening_hours](https://wiki.openstreetmap.org/wiki/Key:opening_hours) format. *(added in v3.0-RC)* | ||
`short_name` | OPTIONAL | String | OPTIONAL abbreviation for a system. | ||
`operator` | OPTIONAL | String | Name of the system operator. | ||
`languages` | Yes | Array | List of languages used in translated strings. Each element in the list must be of type Language. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is Required (that's good!), but does that imply that the array must have at least one value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very valid point, since "Array" is defined in the spec as "A JSON element consisting of an ordered sequence of zero or more values". I'm curious if that's intentional. Is there any field in the spec that would carry different meanings for an empty array as opposed to omitting the field altogether? If not then maybe the right thing to do is to require (optional?) arrays to have at least one element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be clarified in the languages
array definition but wouldn't there always be at least one value in the array? For example if you don't offer any translations and only offer feeds in en
then that's the array value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly - there should always be at least one value, but the spec leaves this open. I.e. providing an empty array would still be valid according to the spec.
This looks like a wise proposal without creating too much work for single-language fields, so +1 from Joyride |
+1 from whereto.app |
+1 from Superpedestrian as well. |
This vote has now closed, and it passes! Votes in favor: Abstentions: |
* Add description to vehicle_types We found it useful to show vehicle-type specific texts to our customers * Update to reflect changes in #460 Co-authored-by: Josee Sabourin <66266820+josee-sabourin@users.noreply.github.com>
If you are new to the specification, please introduce yourself (name and organization/link to GBFS). It’s helpful to know who we're chatting with!
I am Tom Erik Støwer representing Entur AS.
What problem does your proposal solve? Please begin with the relevant issue number. If there is no existing issue, please also describe alternative solutions you have considered.
This proposal solves several problems with internationalization of free-form text, see #262. In summary, this proposal removes the need to duplicate the feeds once for each supported language, and instead "inlines" translations within the free-form text fields themselves.
What is the proposal?
In essence, the proposal turns free-form text fields (text that is intended for humans to read), from a simple String field, into an array of objects, with one object per supported language. The object will have the keys
text
andlanguage
, wheretext
is the translated string, andlanguage
is the IETF BCP 47 language code for the translation (i.e. theLanguage
).The fieldsmake
andmodel
invehicle_types.json
are up for debate. I have included the fieldcolor
as these seem uncontroversial (e.g. the examples are clearly in the English language), whereas make and model may not be translateable at all.In addition, URLs that link to human readable content are subject to the same change, so that URLs can link to different translations. I have not included URLs that link to content intended for data consumers, and not end users.
The
gbfs.json
is modified so that thedata
field has a single fieldfeeds
, instead of one feed field per supported language (with the language as the key).The
language
field insystem_information.json
is changed tolanguages
- an array of supported languages.The localization section is changed so that it describes the new pattern.
Is this a breaking change?
Which files are affected by this change?
gbfs.json
system_information.json
vehicle_types.json
station_information.json
system_regions.json
system_pricing_plans.json
system_alerts.json
geofencing_zones.json