Skip to content

Commit

Permalink
#86 #93 #106 #154 add bgnpcgn mappings for deu, fao, isl, sme
Browse files Browse the repository at this point in the history
  • Loading branch information
CAMOBAP committed Jan 31, 2021
1 parent 7815028 commit d003d22
Show file tree
Hide file tree
Showing 5 changed files with 152 additions and 0 deletions.
55 changes: 55 additions & 0 deletions maps/bgnpcgn-deu-Latn-Latn-2000.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
authority_id: bgnpcgn
id: 2000
language: iso-639-2:deu
source_script: Latn
destination_script: Latn
name: BGN/PCGN German 2000 Roman-Script Spelling Convention Agreement
url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693794/ROMAN-SCRIPT_SPELLING_CONVENTIONS.pdf
creation_date: 2000

notes:
- |
The special letter β, known as eszett, [Unicode: 03B2] is a standard letter of the German alphabet and occurs
only in word-medial and word-final positions. It is a lowercase letter only, and when a word and/or a name is
written entirely in uppercase letters, it is always rendered as SS. As a result of the orthographic reform of
German, implemented in August 1998, the β is now rendered ss if it follows a short vowel, but it is still used
if it follows a long vowel or a diphthong. In those instances where β cannot be reproduced, the digraph ss
may be substituted for it. For alphabetization and sorting purposes, β should be treated as ss.
- |
In those instances when the vowel letters ä [Unicode: 00E4], ö [Unicode: 00F6], and ü [Unicode: 00FC]
cannot be reproduced, the alternate spellings ae, oe, and ue may be substituted.
tests:
- source: Dein weiβes Fleisch erregt mich so
expected: Dein weisses Fleisch erregt mich so
- source: GROβSTÄDTE
expected: GROSSSTAEDTE
- source: Göttingen
expected: Goettingen
- source: Gütersloh
expected: Guetersloh
- source: Mährisch-Ostrau
expected: Maehrisch-Ostrau

map:
rules:
- pattern: "(?<=[[:upper:]])\u03B2(?=[[:upper:]])?"
result: "SS"
- pattern: "(?<=[[:upper:]])\u00C4(?=[[:upper:]])?"
result: "AE"
- pattern: "(?<=[[:upper:]])\u00D6(?=[[:upper:]])?"
result: "OE"
- pattern: "(?<=[[:upper:]])\u00DC(?=[[:upper:]])?"
result: "UE"

characters:
"\u00C4": Ae # Ä
"\u00D6": Oe # Ö
"\u00DC": Ue # Ü

"\u00E4": ae # ä
"\u00F6": oe # ö
"\u00FC": ue # ü

"\u03B2": ss # β
File renamed without changes.
18 changes: 18 additions & 0 deletions maps/bgnpcgn-fao-Latn-Latn-1968.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
authority_id: bgnpcgn
id: 1968
language: iso-639-2:fao
source_script: Latn
destination_script: Latn
name: BGN/PCGN Faroese 1968 Roman-Script Spelling Convention Agreement
url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693794/ROMAN-SCRIPT_SPELLING_CONVENTIONS.pdf
creation_date: 1968

tests:
- source: Fyrirgefðu
expected: Fyrirgefdhu
- source: Þakka
expected: Þakka

map:
inherit: bgnpcgn-fao-Latn-Latn-1964
29 changes: 29 additions & 0 deletions maps/bgnpcgn-isl-Latn-Latn-1968.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
authority_id: bgnpcgn
id: 1968
language: iso-639-2:isl
source_script: Latn
destination_script: Latn
name: BGN/PCGN Icelandic 1968 Roman-Script Spelling Convention Agreement
url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693794/ROMAN-SCRIPT_SPELLING_CONVENTIONS.pdf
creation_date: 1968

notes:
- |
The special letter Ð ð, known as edh, [Unicode: 00D0, 00F0] and the special letter Þ þ, known as thorn,
[Unicode: 00DE, 00FE] should be reproduced in those forms whenever encountered. In those instances
when they cannot be reproduced, however, the digraph Dh dh may be substituted for Ð ð and the digraph
Th th may be substituted for Þ þ.
tests:
- source: Fyrirgefðu
expected: Fyrirgefdhu
- source: þu ert velkominn
expected: thu ert velkominn
- source: GOÐAN DAGINN
expected: GODHAN DAGINN
- source: Þakka
expected: Thakka

map:
inherited: bgnpcgn-isl-Latn-Latn-1964
50 changes: 50 additions & 0 deletions maps/bgnpcgn-sme-Latn-Latn-1984.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
authority_id: bgnpcgn
id: 1984
language: iso-639-2:sme
source_script: Latn
destination_script: Latn
name: BGN/PCGN Northern Sami (North Lappish) 1984 Roman-Script Spelling Convention Agreement
url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693794/ROMAN-SCRIPT_SPELLING_CONVENTIONS.pdf
creation_date: 1984

notes:
- |
The special letter Ŋ ŋ, known as eng, [Unicode: 014A, 014B] should be reproduced in that form whenever
encountered. In those instances when it cannot be reproduced, however, the letter Ń ń [Unicode: 0143, 0144]
may be substituted for it.
- |
In a further note additional to the 1984 agreement, other special letters should be retained as found:
Á á [Unicode: 00C1, 00E1],
Č č [Unicode: 010C, 010D],
Đ đ [Unicode: 0110, 0111],
Š š [Unicode: 0160, 0161],
Ŧ ŧ [Unicode: 0166, 0167],
Ž ž [Unicode: 017D, 017E].
tests: # https://web.archive.org/web/20120918094122/http://www.uta.fi/~km56049/same/svocab.html
- source: adjágas
expected: adjágas
- source: agálaš
expected: agálaš
- source: ÁEL
expected: ÁEL
- source: hčagastinárpu
expected: hčagastinárpu
- source: algŋa
expected: algńa
- source: Šveica
expected: Šveica
- source: MAŊŊIL
expected: MAŃŃIL
- source: giđa
expected: giđa
- source: ruoŧŧelaš
expected: ruoŧŧelaš
- source: skálžu
expected: skálžu

map:
characters:
"\u014A": "\u0143" # Ŋ => Ń
"\u014B": "\u0144" # ŋ => ń

0 comments on commit d003d22

Please sign in to comment.