A modified digital edition based on Arabic-English Dictionary for the Use of Students by J.G. Hava.
The modification is generally in formatting, sometimes in orthography, and occasionally in the text as well.
Work in progress output: https://adamiturabi.github.io/hava-dig/
- Install Python3 as
/usr/bin/python3
- From a terminal in the root dir:
./run.sh
There are the following differences (mostly in typesetting) from Hava's original text.
We add Roman numerals (I, II, III, IV, etc.) in the beginning of each verb form.
We indicate the middle vowel for the imperfect verb indicated using Arabic vowel diacritics instead of the Latin characters: a, i, o. Example:
Original: بَثَرَ i ، بَثِرَ a
Modified: بَثَرَ ◌ِ ، بَثِرَ ◌َ
Verbs' masdars in the original text are in the accusative in the original separated by و. Instead, we enclose them in parentheses without any case, and separated by Arabic comma ،. Example:
Original: بَثَرَ i ، بَثِرَ a بَثْرًا وبَثَرًا وبُثُورًا
Modified: بَثَرَ ◌ِ ، بَثِرَ ◌َ (بَثْر ، بَثَر ، بُثُور)
An OCR digital text of Hava's dictonary is available from archive.com. However, only the English text has been digitized reliably. Evidently, the OCR did not recognize the Arabic text and output it as garbled Roman characters. Also, due to the hybrid left-to-right/right-to-left direction of the columns and pages in the printed volume, the OCR'ed English text needs some internal page reordering.
So, our methodology is to purge the garbled characters which is the output of the OCR of Arabic characters, and to re-order the pages of English text correctly.
Then we manually enter the Arabic text in a separate file. Arabic text may be entered either in Unicode Arabic, Buckwalter transliteration, or using word codes (described below).
These text files are to be found in the txt
subdirectory.
We use the Buckwalter transliteration system with the following additions:
c = $ = ش
C = * = ذ
% = U+0653 (maddah above)
^ = U+0654 (hamza above)
/ = U+0655 (hamza below)
In order to expedite the manual text input, we make use of word codes. Hamza orthography is automatically derived.
Capital Roman numerals are for verb forms. Besides I
for a triliteral root in active voice, these Roman numerals will output the preterite (فعل ماضي) of the current root.
To suppress the preterite, append the numeral by x
. For example,
VIII
for root fEl
will output VIII AifotaEala
VIIIx
for root fEl
will output VIII
To get the passive voice append the numeral by p
. For example,
VIIIp
for root bCl
will output VIII AubotuCila
For form I triliteral roots A
, !
, and U
will output the active voice preterite with the vowels a
, i
, and u
respectively.
To indicate the vowel for the aorist (فعل مضارع) use a
, i
, and either u
or o
.
Arabic numerals 1 through 70 will output nouns based on the coding devised by Salmone in his dictionary, with some additions:
For example numeric code 1003
for current-root=Eml
will output AisotiEomaAl
The intention is that changes due to weak letters should be automatically be taken care of, although this is a work in progress. For example numeric code 403
for current-root=qwm
should output <iqaAmap
, not <iqowaAm
.
Prefixes and suffixes may be added to the numeric code. Suffixes, if they are integral to the word (like ة) are appended directly to the numeric code. If not integral (like dual and plural suffixes and attached pronouns), they are separated by a "
double-quote character. This is for correct hamza orthography.
For example:
for current root=bd'
the numeric code 1ap
will output bado>ap
for current root=bd'
the numeric code 4"aAt
will output bada'aAt
For the definite article prefix (prefix that ends with l
), sun and moon letters are automatically derived. For example
Alo1
for root $ms
will output Al$~amos
ll1
for root $ms
will output ll$~amos
To override the current root for any verb or noun append the new root to the numeric code separated by .
. For example, if the current root is bdr
, then I.bydr
will output I bayodara
In overriding the current root, you may use hyphens for the radicals you don't want to override. For example:
With root bdw
numeric code 22.--'
will output badaA'
Hava uses abreviations like P, T, etc. to indicate foreign borrowings. Use the following format: [Ps]{.fbr}
.
Roots are entered preceded by the @
character on a new line by itself, and only in the Arabic text. For example,
@bd'
For foreign borrowings, Hava does not show a new root but we do if the root letters are different than the preceding root. Root letters for foreign words are usually triliteral, unless they conform to Arabic word patterns in which case they may have more than 3 letters.
Sometimes Hava will add a new root unalphabetically. For example, bydr
directly after bdr
. In this case we won't enter a new root. Instead use the .<new-root>
format if you need to use a numeric word code.
Hava marks colloquialisms with two characters: a four-pointed star, and a square. Use the characters +
and =
respectively in the raw text input and the processing scripts will substitute them with similar unicode characters.
Use the HTML tags <i>some text</i>
around text that you want to italicize.
If there is some Arabic inline text in the English definition then enclose it thus:
To say to a.o.: ([bi>abiy >anota]{.buckwalter}) I will ransom thee with my father.
See the root b'b'
for the above example.