-
Notifications
You must be signed in to change notification settings - Fork 30
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
- Loading branch information
Showing
5 changed files
with
152 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
--- | ||
authority_id: bgnpcgn | ||
id: 2000 | ||
language: iso-639-2:deu | ||
source_script: Latn | ||
destination_script: Latn | ||
name: BGN/PCGN German 2000 Roman-Script Spelling Convention Agreement | ||
url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693794/ROMAN-SCRIPT_SPELLING_CONVENTIONS.pdf | ||
creation_date: 2000 | ||
|
||
notes: | ||
- | | ||
The special letter β, known as eszett, [Unicode: 03B2] is a standard letter of the German alphabet and occurs | ||
only in word-medial and word-final positions. It is a lowercase letter only, and when a word and/or a name is | ||
written entirely in uppercase letters, it is always rendered as SS. As a result of the orthographic reform of | ||
German, implemented in August 1998, the β is now rendered ss if it follows a short vowel, but it is still used | ||
if it follows a long vowel or a diphthong. In those instances where β cannot be reproduced, the digraph ss | ||
may be substituted for it. For alphabetization and sorting purposes, β should be treated as ss. | ||
- | | ||
In those instances when the vowel letters ä [Unicode: 00E4], ö [Unicode: 00F6], and ü [Unicode: 00FC] | ||
cannot be reproduced, the alternate spellings ae, oe, and ue may be substituted. | ||
tests: | ||
- source: Dein weiβes Fleisch erregt mich so | ||
expected: Dein weisses Fleisch erregt mich so | ||
- source: GROβSTÄDTE | ||
expected: GROSSSTAEDTE | ||
- source: Göttingen | ||
expected: Goettingen | ||
- source: Gütersloh | ||
expected: Guetersloh | ||
- source: Mährisch-Ostrau | ||
expected: Maehrisch-Ostrau | ||
|
||
map: | ||
rules: | ||
- pattern: "(?<=[[:upper:]])\u03B2(?=[[:upper:]])?" | ||
result: "SS" | ||
- pattern: "(?<=[[:upper:]])\u00C4(?=[[:upper:]])?" | ||
result: "AE" | ||
- pattern: "(?<=[[:upper:]])\u00D6(?=[[:upper:]])?" | ||
result: "OE" | ||
- pattern: "(?<=[[:upper:]])\u00DC(?=[[:upper:]])?" | ||
result: "UE" | ||
|
||
characters: | ||
"\u00C4": Ae # Ä | ||
"\u00D6": Oe # Ö | ||
"\u00DC": Ue # Ü | ||
|
||
"\u00E4": ae # ä | ||
"\u00F6": oe # ö | ||
"\u00FC": ue # ü | ||
|
||
"\u03B2": ss # β |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
--- | ||
authority_id: bgnpcgn | ||
id: 1968 | ||
language: iso-639-2:fao | ||
source_script: Latn | ||
destination_script: Latn | ||
name: BGN/PCGN Faroese 1968 Roman-Script Spelling Convention Agreement | ||
url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693794/ROMAN-SCRIPT_SPELLING_CONVENTIONS.pdf | ||
creation_date: 1968 | ||
|
||
tests: | ||
- source: Fyrirgefðu | ||
expected: Fyrirgefdhu | ||
- source: Þakka | ||
expected: Þakka | ||
|
||
map: | ||
inherit: bgnpcgn-fao-Latn-Latn-1964 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
authority_id: bgnpcgn | ||
id: 1968 | ||
language: iso-639-2:isl | ||
source_script: Latn | ||
destination_script: Latn | ||
name: BGN/PCGN Icelandic 1968 Roman-Script Spelling Convention Agreement | ||
url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693794/ROMAN-SCRIPT_SPELLING_CONVENTIONS.pdf | ||
creation_date: 1968 | ||
|
||
notes: | ||
- | | ||
The special letter Ð ð, known as edh, [Unicode: 00D0, 00F0] and the special letter Þ þ, known as thorn, | ||
[Unicode: 00DE, 00FE] should be reproduced in those forms whenever encountered. In those instances | ||
when they cannot be reproduced, however, the digraph Dh dh may be substituted for Ð ð and the digraph | ||
Th th may be substituted for Þ þ. | ||
tests: | ||
- source: Fyrirgefðu | ||
expected: Fyrirgefdhu | ||
- source: þu ert velkominn | ||
expected: thu ert velkominn | ||
- source: GOÐAN DAGINN | ||
expected: GODHAN DAGINN | ||
- source: Þakka | ||
expected: Thakka | ||
|
||
map: | ||
inherited: bgnpcgn-isl-Latn-Latn-1964 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
--- | ||
authority_id: bgnpcgn | ||
id: 1984 | ||
language: iso-639-2:sme | ||
source_script: Latn | ||
destination_script: Latn | ||
name: BGN/PCGN Northern Sami (North Lappish) 1984 Roman-Script Spelling Convention Agreement | ||
url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693794/ROMAN-SCRIPT_SPELLING_CONVENTIONS.pdf | ||
creation_date: 1984 | ||
|
||
notes: | ||
- | | ||
The special letter Ŋ ŋ, known as eng, [Unicode: 014A, 014B] should be reproduced in that form whenever | ||
encountered. In those instances when it cannot be reproduced, however, the letter Ń ń [Unicode: 0143, 0144] | ||
may be substituted for it. | ||
- | | ||
In a further note additional to the 1984 agreement, other special letters should be retained as found: | ||
Á á [Unicode: 00C1, 00E1], | ||
Č č [Unicode: 010C, 010D], | ||
Đ đ [Unicode: 0110, 0111], | ||
Š š [Unicode: 0160, 0161], | ||
Ŧ ŧ [Unicode: 0166, 0167], | ||
Ž ž [Unicode: 017D, 017E]. | ||
tests: # https://web.archive.org/web/20120918094122/http://www.uta.fi/~km56049/same/svocab.html | ||
- source: adjágas | ||
expected: adjágas | ||
- source: agálaš | ||
expected: agálaš | ||
- source: ÁEL | ||
expected: ÁEL | ||
- source: hčagastinárpu | ||
expected: hčagastinárpu | ||
- source: algŋa | ||
expected: algńa | ||
- source: Šveica | ||
expected: Šveica | ||
- source: MAŊŊIL | ||
expected: MAŃŃIL | ||
- source: giđa | ||
expected: giđa | ||
- source: ruoŧŧelaš | ||
expected: ruoŧŧelaš | ||
- source: skálžu | ||
expected: skálžu | ||
|
||
map: | ||
characters: | ||
"\u014A": "\u0143" # Ŋ => Ń | ||
"\u014B": "\u0144" # ŋ => ń |