Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter Notebook Translation and Localization #10

Closed
wants to merge 5 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Jupyter Notebook Translation and Localization

## Problem

There is currently no standard approach for translating the GUI of Jupyter notebook. This has driven some people to do a [single language translation for Jupyter 4.1](https://twitter.com/Mbussonn/status/685870031247400960).

For information: previous attempts and related issues:

- ipython/ipython#6718
- ipython/ipython#5922
- jupyter/notebook#870

## Proposed Enhancement

Use Tornado [translation capabilities](http://www.tornadoweb.org/en/stable/locale.html) to translate the GUI's templates. This will cover translating the words and sentences in the GUI and localized styles (like Right to left languages).

## Detail Explanation

The language of the GUI is mostly hard coded in [html template files](https://github.com/jupyter/notebook/tree/master/notebook/templates) with some exceptions where some language is written in [javascript files](https://github.com/jupyter/notebook/blob/master/notebook/static/notebook/js/about.js#L12) and even a few words in [python code](https://github.com/jupyter/notebook/blob/4578c34b0f999735ee49e1492be3dd5941951551/notebook/base/handlers.py#L332).

### HTML Templates

Tornado [exposes](http://www.tornadoweb.org/en/stable/guide/templates.html#template-syntax) its `translate()` function to template rendering using `_` as a shorthand (which is common in other libraries web frameworks like Django). This is an example of how to translate the [menu of a notebook](https://github.com/jupyter/notebook/blob/4.x/notebook/templates/notebook.html#L80):

```HTML
<a href="#">New Notebook</a>
```

This will be done like this:

```HTML
<a href="#">{{ _("New Notebook") }}</a>
```

### Javascrip files

Regarding Javascript we will use the same approach as HTML but we will have to do a few more changes to make sure javascript files get translated before they are sent to the browser. The approach for this is as follows:

1. Subclass `web.StaticFileHandler` and call it `JupyterStaticFileHandler`
2. Overide `get()` function to make it render static files if they end with .js
3. Use `JupyterStaticFileHandler` instead of the `web.StaticFileHandler` in the RequestHander for static files.

To demonstrate translation in a [jacascript file](https://github.com/jupyter/notebook/blob/4.x/notebook/static/notebook/js/about.js#L12), we will use the following:

```javascript
var text = 'You are using Jupyter notebook.<br/><br/>';
```

Will be done like this

```javascript
var text = "{{ _('You are using Jupyter notebook.<br/><br/>') }}";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would definitely be very problematic for caching, since now you have one variant of JS for each language supported, and places that utilize caching will need appropriate purging / vary for those as well. This also is mixing Python (Jinja?) and JS which makes it feel bad/sad. This also makes things very messy for pure JS generated messages.

There's a wide variety of JS localization helper libraries + quick ways of loading them (such as https://github.com/wikimedia/jquery.i18n which wikimedia uses), so I'd rather we use them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt Tornado translation templates will work for messages in Javascript files. I think this is going to have to be done with a Javascript translation utility.

In fact, as we move forward, I think significantly more of this UI will be produced by Javascript and not tornado templates, so perhaps an exclusively js solutionis the way to go.

It seems like i18n would be able to address messages that come from html templates as well as js messages.

```

### Python files

We can use `translate(message, plural_message=None, count=None)` (or it's shorthand `_`) in Tornado RequestHandler or anywhere else where text it sent to the GUI.

To demonstrate this I'll be using [existing text in python](https://github.com/jupyter/notebook/blob/4578c34b0f999735ee49e1492be3dd5941951551/notebook/base/handlers.py#L332) that needs translation:

```python
raise web.HTTPError(400, u'Invalid JSON in body of request')
```

This will be done like this:

```python
raise web.HTTPError(400, _(u'Invalid JSON in body of request'))
```

## Translation Files

All languages will be treated as translations including English. All translation files will be located inside the extensions folder and will be treated as extensions. This will allow Jupyter to be shipped with one translation (English) and allows people to get other translations as an nb-extension.

The files will be .po (Portable Object) files for each language and they will be compiled to .mo (Machine Object) files to work with xgettext which is supported by the `translate()` function in Tornado.

The original PO file can be created using [xgettext](http://www.gnu.org/software/gettext/manual/gettext.html#xgettext-Invocation):

```bash
xgettext [option] [inputfile]
```

For the translation, we can use a text edit for the PO files. But I would recommend using a crowd-sourced solution where people can translate words or sentences on a web application like [POEdit](https://poeditor.com/features/)

## Which translation to use?

The default configuration file can be used to add a new configuration variable for the default language.

c.gui_language = 'en_US'

We can also set it to "auto" if we want to use Tornado to detect the end-user language which is provided in `Accept-Language` header. Tornado can find the best match for the end-user language or return the default language if it doesn't have that translation.

## Pros and Cons

Pros associated with this implementation include:
* No extra dependencies
* Using a well known standard that can be extended for any number of languages
* Can be used later with Jupyter Hub to set multiple languages for multi-lingual teams.

Cons associated with this implementation include:
* Javascrip strings and HTML files will have `{{ _(XXX) }}` in the source code.
* A change in the development guide lines to use translation
* Rendering javascript files means you cannot use `{{XXX}}` or `{% X %}` inside any javascript files. This means no [mustache](https://mustache.github.io/) (It is not used now, but it cannot be used in the future).

## Interested Contributors
@twistedhardware @rgbkrk @captainsafia