Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internationalization of free-form text #460

Merged
merged 1 commit into from
Jan 23, 2023

Conversation

testower
Copy link
Contributor

@testower testower commented Oct 28, 2022

If you are new to the specification, please introduce yourself (name and organization/link to GBFS). It’s helpful to know who we're chatting with!

I am Tom Erik Støwer representing Entur AS.

What problem does your proposal solve? Please begin with the relevant issue number. If there is no existing issue, please also describe alternative solutions you have considered.

This proposal solves several problems with internationalization of free-form text, see #262. In summary, this proposal removes the need to duplicate the feeds once for each supported language, and instead "inlines" translations within the free-form text fields themselves.

What is the proposal?

In essence, the proposal turns free-form text fields (text that is intended for humans to read), from a simple String field, into an array of objects, with one object per supported language. The object will have the keys text and language, where text is the translated string, and language is the IETF BCP 47 language code for the translation (i.e. the Language).

The fields make and model in vehicle_types.json are up for debate. I have included the field color as these seem uncontroversial (e.g. the examples are clearly in the English language), whereas make and model may not be translateable at all.

In addition, URLs that link to human readable content are subject to the same change, so that URLs can link to different translations. I have not included URLs that link to content intended for data consumers, and not end users.

The gbfs.json is modified so that the data field has a single field feeds, instead of one feed field per supported language (with the language as the key).

The language field in system_information.json is changed to languages - an array of supported languages.

The localization section is changed so that it describes the new pattern.

Is this a breaking change?

  • Yes
  • No
  • Unsure

Which files are affected by this change?

  • gbfs.json
  • system_information.json
  • vehicle_types.json
  • station_information.json
  • system_regions.json
  • system_pricing_plans.json
  • system_alerts.json
  • geofencing_zones.json

@testower testower force-pushed the internationalization branch 10 times, most recently from 9a827d6 to c737d70 Compare October 28, 2022 11:21
@hannesj
Copy link

hannesj commented Oct 28, 2022

The fields make and model in vehicle_types.json are up for debate. I have included the field color as these seem uncontroversial (e.g. the examples are clearly in the English language), whereas make and model may not be translateable at all.

They might be represented in a different writing system, eg Latin and Cyrillic scrips.

@josee-sabourin
Copy link
Contributor

I hereby call a vote on this proposal. Voting will be open for 10 full calendar days until 11:59PM UTC on Friday, January 20th.

Please vote for or against the proposal, and include the organization for which you are voting in your comment.

Please note if you can commit to implementing the proposal.

@josee-sabourin josee-sabourin added Vote open v3.0-RC Candidate change for GBFS 3.0 (Major release) labels Jan 10, 2023
@cmonagle cmonagle mentioned this pull request Jan 11, 2023
3 tasks
@cmonagle
Copy link
Contributor

+1 from Transit, thanks @testower

@testower
Copy link
Contributor Author

+1 from Entur (not sure my vote counts since its my proposal)

gbfs.md Show resolved Hide resolved
gbfs.md Outdated
* `https://www.example.com/data/en/system_information.json`
* `https://www.example.com/data/fr/system_information.json`
* Each supported language MUST be listed in the `languages` field in `system_information.json`.
* Translations SHOULD be provided for each supported language for all translateable fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because supported languages are a MUST but individual translations are a SHOULD, do you have any thoughts on preferred behavior when a string translation is missing for a particular language? Similarly, should data be considered invalid if a translation is provided for an unsupported language?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This should be a MUST as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Similarly, should data be considered invalid if a translation is provided for an unsupported language?" not sure if that is necessary - what purpose would it serve to specify this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose would be to make it explicit that there should be 1:1 mapping between values in the languages field in system_information.json and individual localized strings in translatable fields. I think changing this to MUST helps, but I'd go further (see comment below)

`languages` | Yes | Array | List of languages used in translated strings. Each element in the list must be of type Language.
`name` | Yes | Array | Name of the system to be displayed to customers. An array with one object per supported language with the following keys:
\-  `text` | Yes | String | The translated text
\-  `language` | Yes | Language | IETF BCP 47 language code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is kind of a bummer that you have to repeat the same text + language snippet for every translated field. In the spirit of DRY, would you consider introduced an Array<Localized String> type (or something equivalent) and defined the fields of Localized String above in the "Field types" section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we went a few rounds on this but ended up going with what seems to be the overall style of the spec. This could always be changed without it being breaking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to recommend that you also update language description to say:

An IETF BCP 47 language code that MUST match one of values specified in the languages field in system_information.json.

However, it seems silly to repeatedly duplicate that guidance for every translatable field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree. I have played around with some varieties of this but all felt clunky to me and kind of collided with the style of the rest of the spec. However, if you have a concrete proposal on how this would look would you mind drafting it up?

Copy link
Contributor Author

@testower testower Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. the notation Array<Type> is not used elsewhere in the spec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against this, but it will take a while to update the proposal - and the vote may have to be postponed.

@mplsmitch what do you think about this style? It certainly makes things much less verbose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand - changing this separately will not be breaking...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less verbose is better but maybe we should try to pass this as-is, and then open a separate vote on changing to the localized string format. It seems like it would be an easy sell with some example JSON showing how much lighter it would be.

@testower
Copy link
Contributor Author

I have updated the PR with two changes of SHOULD to MUST in the Localization section, as well as corrected a few examples that I missed previously.

@josee-sabourin
Copy link
Contributor

Voting on this PR closes in 2 calendar days. Please vote for or against the proposal, and include the organization for which you are voting in your comment. Please note if you can commit to implementing the proposal.

@testower
Copy link
Contributor Author

It would be a shame to make a new major version without this change in, as it would have to wait for a v4 then, and the state of internationalization now is not great. Can the voting period be extended until we are all happy with this proposal?

@rootsixtysix
Copy link

We discussed this proposal at nextbike by TIER and came up with the following pros and cons:

  • (+) this would reduce some backend-to-backend-traffic for big aggregators/consumers
  • (-) small consumers might prefer a single feed/dataset in one language
  • (-) the feed would become harder to consume. You already need some sort of programming / processing (e.g it wouldn’t work anymore to simply use the dot-operator as data.vehicle_types[0].name)

In general we don’t see that the advantages are big enough for a breaking change. However we would implement the new standard. So no vote / abstentation on this proposal.

@josee-sabourin
Copy link
Contributor

@testower as the advocate you can choose to cancel the vote and continue the discussion until you feel the proposal is in a good place and consensus has been reached. Once at that point you can re-open the vote. There have been changes pending for v3 for some time now, waiting an extra week or so for this to be resolved will not be a problem. Additionally it would allow us to open the vote on 474 for v3 as well.

@bdferris-v2
Copy link
Contributor

+1 from Google

All my word-smithing aside, I think this is a more natural data representation that avoids duplicating the same semantic data cross multiple localized feeds.

To the comments from @rootsixtysix, I think dot-operator syntax would still work for a valid feed. Namely, if a string field was already a required field, then there must be a translation specified (otherwise field is invalid), so you could safely reference the first element for a single-language feed. For multiple language feeds, maybe we could add a "The primary language of the feed should be specified first"?

`opening_hours`<br/>*(added in v3.0-RC)*| Yes | String | Hours and dates of operation for the system in [OSM opening_hours](https://wiki.openstreetmap.org/wiki/Key:opening_hours) format. *(added in v3.0-RC)*
`short_name` | OPTIONAL | String | OPTIONAL abbreviation for a system.
`operator` | OPTIONAL | String | Name of the system operator.
`languages` | Yes | Array | List of languages used in translated strings. Each element in the list must be of type Language.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is Required (that's good!), but does that imply that the array must have at least one value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very valid point, since "Array" is defined in the spec as "A JSON element consisting of an ordered sequence of zero or more values". I'm curious if that's intentional. Is there any field in the spec that would carry different meanings for an empty array as opposed to omitting the field altogether? If not then maybe the right thing to do is to require (optional?) arrays to have at least one element?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be clarified in the languages array definition but wouldn't there always be at least one value in the array? For example if you don't offer any translations and only offer feeds in en then that's the array value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly - there should always be at least one value, but the spec leaves this open. I.e. providing an empty array would still be valid according to the spec.

@benwedge
Copy link

This looks like a wise proposal without creating too much work for single-language fields, so +1 from Joyride

@futuretap
Copy link
Contributor

+1 from whereto.app

@ezmckinn
Copy link
Contributor

+1 from Superpedestrian as well.

@josee-sabourin
Copy link
Contributor

This vote has now closed, and it passes!

Votes in favor:
Transit (consumer)
Google Maps (consumer)
Superpedestrian (producer)
Joyride (producer)

Abstentions:
Nextbike (producer)

There were no votes against.
Thank you to everyone who took the time to review and to vote on this! We incorporate it into 3.0-RC, which should be ready to go in the coming week.

@josee-sabourin josee-sabourin merged commit d9b416f into MobilityData:master Jan 23, 2023
josee-sabourin added a commit to j0kan/gbfs that referenced this pull request Jan 23, 2023
josee-sabourin added a commit that referenced this pull request Jan 23, 2023
* Add description to vehicle_types 

We found it useful to show vehicle-type specific texts to our customers

* Update to reflect changes in #460

Co-authored-by: Josee Sabourin <66266820+josee-sabourin@users.noreply.github.com>
josee-sabourin added a commit that referenced this pull request Jan 23, 2023
Adds in-line annotations to changes made in v3.0-RC (#460, #454, #457, #462, #470)
Removes "RC" from annotations in v2.3-RC and RC2
josee-sabourin added a commit that referenced this pull request Jan 23, 2023
* Add versions to new and updated fields

Adds in-line annotations to changes made in v3.0-RC (#460, #454, #457, #462, #470)
Removes "RC" from annotations in v2.3-RC and RC2
Editorial changes
@richfab richfab added the gbfs.md label Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gbfs.md v3.0-RC Candidate change for GBFS 3.0 (Major release) Vote Passed
Projects
None yet
Development

Successfully merging this pull request may close these issues.