Skip to content

Commit

Permalink
CLDR-17566 converting language specific (#3834)
Browse files Browse the repository at this point in the history
  • Loading branch information
chpy04 authored Jul 11, 2024
1 parent 74f018a commit ed02c7a
Show file tree
Hide file tree
Showing 4 changed files with 121 additions and 0 deletions.
13 changes: 13 additions & 0 deletions docs/site/translation/language-specific.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: Δ - Language Specific Guidance
---

# Δ - Language Specific Guidance

The following pages have guidance for specific languages:

- [Lakota](https://cldr.unicode.org/translation/language-specific/lakota)
- [Odia](https://cldr.unicode.org/translation/language-specific/odia)
- [Persian](https://cldr.unicode.org/translation/language-specific/persian)

![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
19 changes: 19 additions & 0 deletions docs/site/translation/language-specific/lakota.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: Lakota
---
# Lakota

## Special characters for Lakota

Please use the following forms of the non-A-Z letters when entering data for Lakota (you can copy and paste from here):

| | |
|---|---|
| Glottal stop (please use this instead of the right curly quote produced by standard key layouts | ʼ |
| Consonants, lowercase | č ǧ ȟ ŋ š ž |
| Consonants, uppercase | Č Ǧ Ȟ Ŋ Š Ž |
| Stressed vowels | á é í ó ú |
| Standard digraphs using special letters (you can use these or combine the single letters above) | aŋ čh čʼ iŋ kȟ kʼ pȟ pʼ tȟ tʼ uŋ |
| Additional digraphs (you can use these or combine the single letters above) | ȟʼ sʼ šʼ |

![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
27 changes: 27 additions & 0 deletions docs/site/translation/language-specific/odia.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: Odia
---

# Odia

**Translation approach - transliteration and diacritics in Odia**

New agreement for Odia Translation guide:

1. Avoid the use of diacritics when transliteration is required in Oriya - diacritics can be easily understood by Oriya well-versed users, but plain transliteration (without diacritics) is more common and preferred.

Follow the General Guidelines for Country/region names:

1. Use the most neutral grammatical form for the country/region that is natural for these two usages above. If there is no single form that can accomplish that, favor the usage within UI menus.
2. Use the capitalization that would be appropriate in the middle of a sentence; the \<contextTransforms> data can specify the capitalization for other contexts. For more information, see Capitalization.
3. Each of the names must be unique.
4. Don't use commas and don't invert the name (eg use "South Korea", not "Korean, South").
5. Don't use the characters "(" and ")", since they will be confusing in complex language names. If you have to use brackets, use square ones: [ and ].

**Helpful examples**

1. Generally speaking, the use of diacritics when transliterating ie. geographic names (especially for lesser known countries such as Gabon or Man of Isles) should be acceptable / preferred for well-versed users.
1. That said and given the fact that diacritics change pronunciation in Oriya (for example, ଲଣ୍ଡନ୍ will be pronounced as London but ଲଣ୍ଡନ will be pronounced as Londonaw), a transliteration approach with regular adoption of diacritics could potentially trigger confusion among not-well-versed users.
2. With these considerations in mind and with the goal of achieving consistency across categories and companies, Google linguists are open to the introduction of a general translation guideline in favor of transliteration without adoption of diacritics

![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
62 changes: 62 additions & 0 deletions docs/site/translation/language-specific/persian.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: Persian
---

# Persian

## Persian style guide and Common issues

### Orthography

Please follow the orthography published by the Persian Academy (دستور خط فارسی). Since the rules are sometimes complicated and hard to decipher, refer to فرهنگ املایی خط فارسی, by Ali-Ashraf Sadeghi and Zahra Zandi-Moghaddam. (We have PDF versions [here](https://drive.google.com/file/d/1R2_7PMMxNzu_rYZvUQgWEGsFBA549Z1K/view?usp=sharing) and [here](https://drive.google.com/file/d/1bDIQ2XWGsahQbg9yZ3DqLKaFuh41RBxx/view?usp=sharing), but we don’t know if these PDFs are the latest editions or not. Refer to the latest printed versions, if you can.)

- Always write the *ezafe* over *he*, if it’s pronounced. For example, use مقدونیهٔ شمالی for North Macedonia.
- For names of continents and their derived forms that could start with either *aa-ye baa-kolaah* (آ) or *alef* (ا), use *alef*: Africa should be افریقا and North America should be امریکای شمالی.

### Characters to use

It may appear that there is a choice among which characters to use for certain Persian letters, but the Unicode Standard and the Iranian National Standard ISIRI 6219, are strict about what to use for different letters or marks:

- For *kaaf*, use U+06A9 ک (and not U+0643 ك).
- For *ye*, use U+06CC ی (and not U+0649 ي or U+064A ى)
- For digits, use U+06F0..U+06F9 ۰۱۲۳۴۵۶۷۸۹ (and not U+0660..U+0669)
- For decimal separator, use U+066B ٫ (and not /)
- For thousands separator, use U+066C ٬ (and not any of ,،`’ etc.)
- For *ezafe* over *he*, use \<U+0647, U+0654> هٔ (and not U+06C0)

Locale patterns: Most of existing CLDR locale data for Persian is based on the [FarsiWeb publication “نیازهای شرایط محلی برای فارسی ایران”](https://drive.google.com/file/d/1yDoUbXnV_q6mrzzaRZK_AvsOLaU-O9Qy/view?usp=sharing), which is in turn based on extensive research in Persian standards and reference material. Follow that document where it covers an issue, and try to remain consistent with it if it doesn’t.

### Language, script, region, and location names

Please do not rely on the Persian Wikipedia for translation of these. You can consult the Persian Wikipedia as a start, but never use it as a primary reference; instead look at its references.

Or find a good Persian reference book about languages and scripts (such as Razi Hirmandi’s translation of Kenneth Katzner’s *The Languages of the World*, published as زبانهای جهان by Markaz-e Nashr-e Daneshgahi, which is the source of the names of most Persian language names in CLDR), or a good atlas, and use names from those instead.

Even better, find multiple references and compare. If there exists a consensus Persian name, it will become clear after consulting multiple references.

Try to use references published before the Persian Wikipedia started in 2003, to minimize potential influences. If in doubt, or can’t find a reference, it may be better to avoid voting for a value instead of using something potentially made up by a Persian Wikipedia editor.

- For names that start with Southern, Western, etc, use the pattern where the compass point comes before the region name. For example, Southern Africa would be جنوب افریقا, while South Africa would be افریقای جنوبی.

### Currencies

The pattern we follow is name of currency, followed by an *ezafe* (written if the Persian name of the currency ends in most vowels), followed by the name of the region. For example, Canadian dollar is دلار کانادا, while Indian rupee is روپیهٔ هند.

### Dates and time

For date formats when a year follows a month, in some calendar systems such as Gregorian and Islamic, the *ezafe* form of month names should be used. For example, while January 12 would be ‏۱۲ ژانویه, January 2019 would be ژانویهٔ ۲۰۱۹. To make this distinction, stand-alone patterns (LLLL etc) are localized without *ezafe*, while formatting patterns (MMMM etc) are localized with *ezafe*. When localizing patterns, pay attention to this distinction and use the correct pattern. For example, “MMMM d, y” should be translated as “d MMMM y” (since the Persian version would need the *ezafe*), while “MMMM d” should be translated as “d LLLL” (since the Persian version doesn’t use the *ezafe*).

### Units

TBD

### Time zones

TBD

### Characters

TBD


![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)

0 comments on commit ed02c7a

Please sign in to comment.