This repository makes available a collection of bilingual English dictionaries in DSL format derived from interwiki links (links between article titles in different languages) in Wikipedia. The data has been extracted from Wikidata.
ABBYY Lingvo DSL is a flexible dictionary format that can be read by dictionary applications such as Goldendict and converted to other formats using tools such as pyglossary. There are also a number of tools for creating DSL format dictionaries available in the dsl-tools project.
DSL files must be saved as UTF-16 to be usable by dictionary programs. The raw source files in this repository are saved in UTF-8 format, which is both significantly smaller in terms of file size, and also readable (and diffable) by git. However, there are fully encoded and compressed .dsl.dz
dictionaries ready for use available in the Releases section.
You can also use the rezip_dsl.rb
and unzip_dsl.rb
scripts provided by the dsl-tools repo to encode/compress and decode/uncompress the dictionaries either individually or as a group.
The data directory contains the bilingual dictionaries in pairs according to ISO language code.
The basic filename pattern is [ISO]-en_wikidict.dsl
, with [ISO]
being the source language ISO code. A list of all language pairs is below.
Language codes | Language names |
---|---|
af-en |
Afrikaans => English |
am-en |
Amharic => English |
ang-en |
Anglo-Saxon => English |
ar-en |
Arabic => English |
arc-en |
Aramaic => English |
bg-en |
Bulgarian => English |
bi-en |
Bislama => English |
bn-en |
Bengali => English |
bo-en |
Tibetan => English |
br-en |
Breton => English |
bs-en |
Bosnian => English |
ca-en |
Catalan => English |
cdo-en |
Min Dong => English |
chr-en |
Cherokee => English |
chy-en |
Cheyenne => English |
cr-en |
Cree => English |
cs-en |
Czech => English |
cy-en |
Welsh => English |
da-en |
Danish => English |
de-en |
German => English |
el-en |
Greek => English |
eo-en |
Esperanto => English |
es-en |
Spanish => English |
et-en |
Estonian => English |
eu-en |
Basque => English |
fa-en |
Persian => English |
ff-en |
Fula => English |
fi-en |
Finnish => English |
fr-en |
French => English |
ga-en |
Irish => English |
gan-en |
Gan => English |
gd-en |
Scottish Gaelic => English |
gu-en |
Gujarati => English |
gv-en |
Manx => English |
ha-en |
Hausa => English |
hak-en |
Hakka => English |
haw-en |
Hawaiian => English |
he-en |
Hebrew => English |
hi-en |
Hindi => English |
hr-en |
Croatian => English |
ht-en |
Haitian => English |
hu-en |
Hungarian => English |
hy-en |
Armenian => English |
id-en |
Indonesian => English |
ig-en |
Igbo => English |
is-en |
Icelandic => English |
it-en |
Italian => English |
iu-en |
Inuktitut => English |
ja-en |
Japanese => English |
jbo-en |
Lojban => English |
jv-en |
Javanese => English |
ka-en |
Georgian => English |
kg-en |
Kongo => English |
ki-en |
Kikuyu => English |
kl-en |
Greenlandic => English |
km-en |
Khmer => English |
ko-en |
Korean => English |
la-en |
Latin => English |
lg-en |
Luganda => English |
lo-en |
Lao => English |
lt-en |
Lithuanian => English |
lv-en |
Latvian => English |
mg-en |
Malagasy => English |
mi-en |
Maori => English |
mn-en |
Mongolian => English |
ms-en |
Malay => English |
mt-en |
Maltese => English |
nah-en |
Nahuatl => English |
ne-en |
Nepali => English |
nl-en |
Dutch => English |
nn-en |
Norwegian (Nynorsk) => English |
no-en |
Norwegian => English |
nv-en |
Navajo => English |
ny-en |
Chichewa => English |
oc-en |
Occitan => English |
pa-en |
Punjabi => English |
pi-en |
Pali => English |
pl-en |
Polish => English |
ps-en |
Pashto => English |
pt-en |
Portuguese => English |
qu-en |
Quechua => English |
ro-en |
Romanian => English |
ru-en |
Russian => English |
sa-en |
Sanskrit => English |
se-en |
Northern Sami => English |
sh-en |
Serbo-Croatian => English |
sk-en |
Slovak => English |
sl-en |
Slovenian => English |
sn-en |
Shona => English |
so-en |
Somali => English |
sq-en |
Albanian => English |
sr-en |
Serbian => English |
sv-en |
Swedish => English |
sw-en |
Kiswahili => English |
ta-en |
Tamil => English |
te-en |
Telugu => English |
th-en |
Thai => English |
tl-en |
Tagalog => English |
tpi-en |
Tok Pisin => English |
tr-en |
Turkish => English |
ug-en |
Uyghur => English |
uk-en |
Ukrainian => English |
ur-en |
Urdu => English |
vi-en |
Vietnamese => English |
wo-en |
Wolof => English |
wuu-en |
Wu => English |
xh-en |
Xhosa => English |
yi-en |
Yiddish => English |
yo-en |
Yoruba => English |
za-en |
Zhuang => English |
zh-en |
Chinese (Mandarin) => English |
zh_classical-en |
Classical Chinese => English |
zh_min_nan-en |
Min Nan => English |
zh_yue-en |
Cantonese => English |
zu-en |
Zulu => English |
Language pair | # of entries |
---|---|
af-en |
29951 |
am-en |
6308 |
ang-en |
2615 |
ar-en |
214043 |
arc-en |
1378 |
bg-en |
140193 |
bi-en |
479 |
bn-en |
30566 |
bo-en |
2856 |
br-en |
44870 |
bs-en |
33163 |
ca-en |
309248 |
cdo-en |
2217 |
chr-en |
486 |
chy-en |
705 |
cr-en |
111 |
cs-en |
209111 |
cy-en |
46719 |
da-en |
134272 |
de-en |
907990 |
el-en |
75213 |
eo-en |
159061 |
es-en |
742633 |
et-en |
79349 |
eu-en |
152904 |
fa-en |
348197 |
ff-en |
208 |
fi-en |
258347 |
fr-en |
1010365 |
ga-en |
29978 |
gan-en |
5087 |
gd-en |
13833 |
gu-en |
5445 |
gv-en |
4592 |
ha-en |
511 |
hak-en |
3349 |
haw-en |
1931 |
he-en |
128257 |
hi-en |
40066 |
hr-en |
98143 |
ht-en |
30983 |
hu-en |
192031 |
hy-en |
67842 |
id-en |
161262 |
ig-en |
816 |
is-en |
26241 |
it-en |
795898 |
iu-en |
366 |
ja-en |
420717 |
jbo-en |
1170 |
jv-en |
20532 |
ka-en |
63784 |
kg-en |
840 |
ki-en |
309 |
kl-en |
1605 |
km-en |
2361 |
ko-en |
193308 |
la-en |
102756 |
lg-en |
178 |
lo-en |
1220 |
lt-en |
94850 |
lv-en |
45109 |
mg-en |
68386 |
mi-en |
2551 |
mn-en |
12167 |
ms-en |
187732 |
mt-en |
2803 |
nah-en |
7809 |
ne-en |
11448 |
nl-en |
715263 |
nn-en |
94129 |
no-en |
275488 |
nv-en |
2156 |
ny-en |
167 |
oc-en |
80831 |
pa-en |
11694 |
pi-en |
2643 |
pl-en |
691110 |
ps-en |
3741 |
pt-en |
588641 |
qu-en |
15580 |
ro-en |
204776 |
ru-en |
628146 |
sa-en |
5939 |
se-en |
6059 |
sh-en |
189316 |
sk-en |
143133 |
sl-en |
90227 |
sn-en |
1644 |
so-en |
2698 |
sq-en |
31252 |
sr-en |
210431 |
sv-en |
539706 |
sw-en |
23612 |
ta-en |
45860 |
te-en |
14193 |
th-en |
66679 |
tl-en |
48164 |
tpi-en |
1331 |
tr-en |
156837 |
ug-en |
2320 |
uk-en |
314535 |
ur-en |
60176 |
vi-en |
397221 |
wo-en |
956 |
wuu-en |
2850 |
xh-en |
305 |
yi-en |
8508 |
yo-en |
28888 |
za-en |
666 |
zh-en |
435714 |
zh_classical-en |
7165 |
zh_min_nan-en |
11617 |
zh_yue-en |
24066 |
zu-en |
666 |
Language pair | # of entries |
---|---|
fr-en |
1010365 |
de-en |
907990 |
it-en |
795898 |
es-en |
742633 |
nl-en |
715263 |
pl-en |
691110 |
ru-en |
628146 |
pt-en |
588641 |
sv-en |
539706 |
zh-en |
435714 |
According to the Wikidata website:
All structured data from the main and property namespace is available under the Creative Commons CC0 License
The data in this repository is therefore made available under the same Creative Commons CC0 License as that used by the Wikidata project. All of the data has been derived from the Wikidata JSON format database dumps.