Skip to content

open-dsl-dict/wikidict-dsl-en

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

wikidict-dsl-en - Wikidata Bilingual DSL Dictionaries (English)

This repository makes available a collection of bilingual English dictionaries in DSL format derived from interwiki links (links between article titles in different languages) in Wikipedia. The data has been extracted from Wikidata.

Format

ABBYY Lingvo DSL is a flexible dictionary format that can be read by dictionary applications such as Goldendict and converted to other formats using tools such as pyglossary. There are also a number of tools for creating DSL format dictionaries available in the dsl-tools project.

DSL files must be saved as UTF-16 to be usable by dictionary programs. The raw source files in this repository are saved in UTF-8 format, which is both significantly smaller in terms of file size, and also readable (and diffable) by git. However, there are fully encoded and compressed .dsl.dz dictionaries ready for use available in the Releases section.

You can also use the rezip_dsl.rb and unzip_dsl.rb scripts provided by the dsl-tools repo to encode/compress and decode/uncompress the dictionaries either individually or as a group.

Data

The data directory contains the bilingual dictionaries in pairs according to ISO language code.

The basic filename pattern is [ISO]-en_wikidict.dsl, with [ISO] being the source language ISO code. A list of all language pairs is below.

Available language pairs

Language codes Language names
af-en Afrikaans => English
am-en Amharic => English
ang-en Anglo-Saxon => English
ar-en Arabic => English
arc-en Aramaic => English
bg-en Bulgarian => English
bi-en Bislama => English
bn-en Bengali => English
bo-en Tibetan => English
br-en Breton => English
bs-en Bosnian => English
ca-en Catalan => English
cdo-en Min Dong => English
chr-en Cherokee => English
chy-en Cheyenne => English
cr-en Cree => English
cs-en Czech => English
cy-en Welsh => English
da-en Danish => English
de-en German => English
el-en Greek => English
eo-en Esperanto => English
es-en Spanish => English
et-en Estonian => English
eu-en Basque => English
fa-en Persian => English
ff-en Fula => English
fi-en Finnish => English
fr-en French => English
ga-en Irish => English
gan-en Gan => English
gd-en Scottish Gaelic => English
gu-en Gujarati => English
gv-en Manx => English
ha-en Hausa => English
hak-en Hakka => English
haw-en Hawaiian => English
he-en Hebrew => English
hi-en Hindi => English
hr-en Croatian => English
ht-en Haitian => English
hu-en Hungarian => English
hy-en Armenian => English
id-en Indonesian => English
ig-en Igbo => English
is-en Icelandic => English
it-en Italian => English
iu-en Inuktitut => English
ja-en Japanese => English
jbo-en Lojban => English
jv-en Javanese => English
ka-en Georgian => English
kg-en Kongo => English
ki-en Kikuyu => English
kl-en Greenlandic => English
km-en Khmer => English
ko-en Korean => English
la-en Latin => English
lg-en Luganda => English
lo-en Lao => English
lt-en Lithuanian => English
lv-en Latvian => English
mg-en Malagasy => English
mi-en Maori => English
mn-en Mongolian => English
ms-en Malay => English
mt-en Maltese => English
nah-en Nahuatl => English
ne-en Nepali => English
nl-en Dutch => English
nn-en Norwegian (Nynorsk) => English
no-en Norwegian => English
nv-en Navajo => English
ny-en Chichewa => English
oc-en Occitan => English
pa-en Punjabi => English
pi-en Pali => English
pl-en Polish => English
ps-en Pashto => English
pt-en Portuguese => English
qu-en Quechua => English
ro-en Romanian => English
ru-en Russian => English
sa-en Sanskrit => English
se-en Northern Sami => English
sh-en Serbo-Croatian => English
sk-en Slovak => English
sl-en Slovenian => English
sn-en Shona => English
so-en Somali => English
sq-en Albanian => English
sr-en Serbian => English
sv-en Swedish => English
sw-en Kiswahili => English
ta-en Tamil => English
te-en Telugu => English
th-en Thai => English
tl-en Tagalog => English
tpi-en Tok Pisin => English
tr-en Turkish => English
ug-en Uyghur => English
uk-en Ukrainian => English
ur-en Urdu => English
vi-en Vietnamese => English
wo-en Wolof => English
wuu-en Wu => English
xh-en Xhosa => English
yi-en Yiddish => English
yo-en Yoruba => English
za-en Zhuang => English
zh-en Chinese (Mandarin) => English
zh_classical-en Classical Chinese => English
zh_min_nan-en Min Nan => English
zh_yue-en Cantonese => English
zu-en Zulu => English

Statistics

Dictionary size

Language pair # of entries
af-en 29951
am-en 6308
ang-en 2615
ar-en 214043
arc-en 1378
bg-en 140193
bi-en 479
bn-en 30566
bo-en 2856
br-en 44870
bs-en 33163
ca-en 309248
cdo-en 2217
chr-en 486
chy-en 705
cr-en 111
cs-en 209111
cy-en 46719
da-en 134272
de-en 907990
el-en 75213
eo-en 159061
es-en 742633
et-en 79349
eu-en 152904
fa-en 348197
ff-en 208
fi-en 258347
fr-en 1010365
ga-en 29978
gan-en 5087
gd-en 13833
gu-en 5445
gv-en 4592
ha-en 511
hak-en 3349
haw-en 1931
he-en 128257
hi-en 40066
hr-en 98143
ht-en 30983
hu-en 192031
hy-en 67842
id-en 161262
ig-en 816
is-en 26241
it-en 795898
iu-en 366
ja-en 420717
jbo-en 1170
jv-en 20532
ka-en 63784
kg-en 840
ki-en 309
kl-en 1605
km-en 2361
ko-en 193308
la-en 102756
lg-en 178
lo-en 1220
lt-en 94850
lv-en 45109
mg-en 68386
mi-en 2551
mn-en 12167
ms-en 187732
mt-en 2803
nah-en 7809
ne-en 11448
nl-en 715263
nn-en 94129
no-en 275488
nv-en 2156
ny-en 167
oc-en 80831
pa-en 11694
pi-en 2643
pl-en 691110
ps-en 3741
pt-en 588641
qu-en 15580
ro-en 204776
ru-en 628146
sa-en 5939
se-en 6059
sh-en 189316
sk-en 143133
sl-en 90227
sn-en 1644
so-en 2698
sq-en 31252
sr-en 210431
sv-en 539706
sw-en 23612
ta-en 45860
te-en 14193
th-en 66679
tl-en 48164
tpi-en 1331
tr-en 156837
ug-en 2320
uk-en 314535
ur-en 60176
vi-en 397221
wo-en 956
wuu-en 2850
xh-en 305
yi-en 8508
yo-en 28888
za-en 666
zh-en 435714
zh_classical-en 7165
zh_min_nan-en 11617
zh_yue-en 24066
zu-en 666

Top ten dictionaries by number of entries

Language pair # of entries
fr-en 1010365
de-en 907990
it-en 795898
es-en 742633
nl-en 715263
pl-en 691110
ru-en 628146
pt-en 588641
sv-en 539706
zh-en 435714

License

According to the Wikidata website:

All structured data from the main and property namespace is available under the Creative Commons CC0 License

The data in this repository is therefore made available under the same Creative Commons CC0 License as that used by the Wikidata project. All of the data has been derived from the Wikidata JSON format database dumps.