This is a build tool/repo/thing using pandoc meant to help typeset LuaLaTeX articles in various South Asian languages.
Here is a list of things you will need to have installed before using this project.
- babashka (to run
make.clj
) - TeXLive (to actually typeset the document)
- Pandoc (to convert the Markdown to LuaLaTeX)
Once all that is downloaded, run ./setup.clj
, as that will set up the local environment for babel
.
Once you've figured out what language and script you want to write in, copy the appropriate file from examples/
into the project root.
For example, if you're writing in Meitei, copy examples/meetei.md
to the project root as document.md
and modify it with your text content. Then run
./make.clj :lang meetei :in document.md :out document.pdf
Because the script uses babashka, you have the option of doing this:
./make.clj :lang <lang> :in <in-file> :out <out-file>
or this:
./make.clj --lang <lang> --in <in-file> --out <out-file>
./make.clj -L <lang> -i <in-file> -o <out-file>
Here <lang>
refers to any of the supported languages.
This project is also meant to help people create professional works in their own native language simply by using Pandoc-compatible Markdown rather than using LaTeX or Word.
Another specific goal with this is to help speakers of minority languages, namely indigenous languages in South Asia, create professional literary works in their language with ease.
- XDG File Directory support for Unix
- This would mean making the executable standalone
- Adding support for other pandoc supported markup file structures
- Adding ConTeXt support and templates.
- Make a markdown/pandoc editor
The language tag format usually goes by <language>/<script>
but there can and probably will be exceptions.
The following are lists of language tags you can pass into --lang
:
ahom
(Tai Ahom)aiton
(Tai Aiton)assamese
bangla
bhojpuri/kaithi
bhojpuri/devanagari
chakma
dhivehi
dogri
gondi/gunjala
gondi/masaram
gujarati
hindi
ho
kannada
kashmiri/nastaliq
kashmiri/sharada
lepcha
limbu
maithili/mithilakshar
maithili/devanagari
malayalam
marathi
meitei
(Manipuri)mru
nepali
newari/devanagari
newari/newa
odia
pashto
punjabi/gurmukhi
punjabi/shahmukhi
santali
saraiki
saurashtra
sindhi/khudawadi
sindhi/naskh
sinhala
sora
syloti
tamil
tangsa
telugu
tibetan
toto
urdu
western-pahari
zomi
You may notice that not everything is translated (e.g. the word "Contents" may show up as "?contents?"), but I would appreciate anyone who could help provide translations and localizations for the .ini
files in locale/
.
Currently not yet supported in no particular order (but would be
tulu
language andtulu-tigalari
script (no Unicode support)tolong siki
script (no Unicode support)kurukh banna
script (no Unicode support)achik-tokbirim
script- ...and many more
Professional typesetting for non-latin based scripts has been relatively inaccessible for a lot of writers in South Asia, and this was born from another project where I wanted to be able to generate professional PDFs in the languages of the Indian subcontinent with minimal configuration.
Another reason is that Adivasi/indigenous and other minority languages of South Asia have often been overlooked and often have a lack of reading material and resources, and so hopefully this makes it easier to at least publish/write material for more people.
Unfortunately, right now not everything is translated, so that is something that is bound to show up at some point. In fact, some of the words that have already been translated in the babel repository may be incorrect. See
Let's take Sora Sompeng as an example. In langs/sora/prelude.md
we have:
---
lang: srb
babel-otherlang: sora
babeloptions:
- import=srb
- maparabic
mainfont: Noto Sans Sora Sompeng
mainfontoptions:
- Extension=.ttf
- Path=scripts/sora-sompeng/fonts/
- UprightFont=*-Regular
- BoldFont=*-Bold
- Renderer=HarfBuzz
- Script=Sora Sompeng
---
Simply comment out either the line with either maparabic
or mapdigits
. Note that several languages like Ahom, Tamil, Malayalam, Kannada, Telugu, Sinhala, and Tibetan have this featured off by default.
---
lang: srb
babel-otherlang: sora
babeloptions:
- import=srb
# - maparabic
mainfont: Noto Sans Sora Sompeng
mainfontoptions:
- Extension=.ttf
- Path=scripts/sora-sompeng/fonts/
- UprightFont=*-Regular
- BoldFont=*-Bold
- Renderer=HarfBuzz
- Script=Sora Sompeng
---
Create an issue and I'll probably do a walk through on how to "create" your own language using this mechanism. But I am working to include as many of the subcontinental languages, so bear with me.
Place your fonts in the corresponding scripts
folder. For example, Tamil fonts should go in scripts/tamil/fonts/
. Then, in your document at the top among the various other settings:
mainfont: <font_name>
mainfontoptions:
- Extension= <extension>
- Path=scripts/tamil/fonts/
- UprightFont= <upright-font> pattern
- ItalicFont= <italic-font> pattern
- Renderer=HarfBuzz
- Script=Tamil
A couple of notes here:
<extension>
refers to the actual file extensionUprightFont
andItalicFont
match the file names of the various upgright and other fonts- So if your regular font name for example is named
TiroTamil-Regular.ttf
, then put*-Regular
, and if your italic font name isTiroTamil-Italic.ttf
, add*-Italic
to ItalicFont
- So if your regular font name for example is named
Script
should match the script name if possible
We would really appreciate fonts that are tailored for the various indigenous scripts like Sora, Santali, Takri, Mundari Bani, Gunjala and Masaram Gondi.
We would really appreciate translation and localization help. It would go a long way towards helping further localize documents published in these languages.
If you figure you can streamline this even more, share your ideas and PRs.
If you want to help in anyway, you can create an issue and I will very likely look at it.