[non-production-ready] GitHub Action for HXLTM (Multilingual Terminology in Humanitarian Language Exchange). TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, and more.
Click to see more context details.
What is HXLTM? Referece tooling? The HXLTM Action?
What is HXLTM?
The HXLTM documented convetions (ontologia) explains how store terminology and translation memories in HXL. This make both very compact storage while viable to alow human colaborative editing for complex cases even without advanced frontends.
Referece tooling
Public domain reference tooling enable direct convertion from HXLTM to both templated files (in short: more-than-string-replace placeholders with content from HXLTM) and both user customizable and industry standards related to linguistic content.
- TBX (TermBase eXchange)
- TMX (Translation Memory eXchange)
- XLIFF (XML Localization Interchange File Format)
- UTX (Universal Terminology eXchange) export only
- HXLTM itself, on some container, either on local disk or remove server:
- CSV
- Google Sheets read only
- Microsoft Excel read only
- ...and much, much more. See https://hdp.etica.ai/hxltm/archivum/
The HXLTM Action
This GitHub Action abstract part of what is possible use with underling HXLTM cli tools. This action also allow use the fantastic command line tools shipped with libhxl-python configurable with the bin parameter.
Source code for underlining applications:
- HXL Standard tools: https://github.com/HXLStandard/libhxl-python
- HXLTM cli tools: https://github.com/EticaAI/HXL-Data-Science-file-formats
Table of Contents
- Example usage
- Documentation
- To do
- License
Are you new to GitHub Actions? PROTIP!
PROTIP: if you are new to GitHub Actions consider each published action with 💖 by with others (TL;DR the
- uses:
of- uses: actions/checkout@v2
part) as building blocks who run (TL;DR theruns-on: ubuntu-latest
part) on 8GB to 14GB RAM powerful virtual machines and are 100% free and unlimited (*) to public open source projects.(*): but even in good intent, avoid too often unauthenticated request for external services without strong reason, like Google Sheets. Special care with Scheduled jobs for datasets someone else already is sharing a cached version and hosting on GitHub Pages or some other site.
on: [push]
jobs:
HXLTM-export:
name: Converts HXLTM to multilingual data formats
runs-on: ubuntu-latest
steps:
- name: Checkout the git repository to the actions temporary host runner
uses: actions/checkout@v2
- name: "HXLTM to TBX (TermBase eXchange)"
uses: actions/hxltm-action@v0.4.0
with:
bin: 'hxltmcli'
# https://hdp.etica.ai/hxltm/archivum/#TBX-Basim
args: "--objectivum-TBX-Basim"
infile: 'fontem.tm.hxl.csv'
outfile: 'objectivum.tbx'
- name: "HXLTM to TMX (Translation Memory eXchange)"
uses: actions/hxltm-action@v0.4.0
with:
bin: 'hxltmcli'
args: "--objectivum-TMX"
infile: 'fontem.tm.hxl.csv'
outfile: 'objectivum.tmx'
- name: "HXLTM to UTX (Universal Terminology eXchange)"
uses: actions/hxltm-action@v0.4.0
with:
bin: 'hxltmcli'
args: "--objectivum-UTX"
infile: 'fontem.tm.hxl.csv'
outfile: 'objectivum.utx'
Examples of repositories using this action
The hxltm-action-example
is used to test the lasted version of hxltm-action
.
It's recommended to specify a version (or a strict hash), like @v0.4.0
instead of @main
, so - uses: fititnt/hxltm-action@main
would become
- uses: fititnt/hxltm-action@v0.4.0
.
This documentation explains the action.yml and entrypoint.sh strategy to abstract the command line usage described at https://hdp.etica.ai/hxltm/archivum/.
Baseline inputs, together with Environment variables, are enough to abstract how to use the underlying command line tools. The syntactic sugar inputs offer some level of abstraction.
# TODO: explain this snipped a bit better
- # name: "Some description here"
uses: fititnt/hxltm-action@v0.4.0
with:
bin: "hxltmcli" # hxltmcli, hxltmdexml
args: "" #
infile: path/to/fontem.tm.hxl.csv
outfile: path/to/objectivum
Required The executable to run.
Parameter examples:
hxltmcli
(or.github/hxltm/hxltmcli.py
) (*)hxltmdexml
(or.github/hxltm/hxltmdexml.py
) (*)(*): If necessary, a local customized fork of the reference HXLTM tools can be stored near where the data is processed. The suggested places are .github/hxltm/(file).py. This can both be useful for testing proposes or immediate hotfixes under urgency response where you as implementer cannot wait.
Arguments passed for the program defined by bin parameter.
Parameter examples:
--help
-v
--sheet 7
(Select sheet from a Excel workbook (1 is first sheet))
The input file for the program defined by bin parameter
Note on non use of pipelines.
Default "fontem.ext"
.
Parameter examples:
fontem.hxl.csv
fontem.tbx
The output file for the program defined by bin parameter
Note on non use of pipelines.
Default "objectivum.ext"
.
Parameter examples:
objectivum.tbx
objecricum.hxl.csv
The way GitHub Actions steps works, environment variables can be both passed at the entire job level or at specific tasks. One implication of action.yml and entrypoint.sh is that the use of environment variables at job level can be used to create default values for potentially repetitive values, like working_languages.
TODO: test this potential implication and document it.
This section shows some syntactic sugar (or intentional syntactic saccharins) for what could be done using other ways, often with args parameter. Some of these only use English for what hxltm cli tools use Latin.
A syntax sugar to evoke bin program with --help and exit without
raising error. Default false
.
Just copy and paste the following.
- name: "hxltmcli --help"
uses: fititnt/hxltm-action@v0.4.0
with:
bin: "hxltmcli"
args: "--help"
- name: "hxltmdexml --help"
uses: fititnt/hxltm-action@v0.4.0
with:
bin: "hxltmdexml"
args: "--help"
Extra: HXLStandard cli tools
Since libhxl-python
is a requeriment of hxltm, you can reuse this action to
pre-process already HXLated datasets (if not HXLated yet, use hxltag
and manually map.)
# Bonus: HXLStandard cli tools ___________________________________________
# @see https://github.com/HXLStandard/libhxl-python/wiki/Command-line-tools
- name: "hxlspec --help"
uses: fititnt/hxltm-action@v0.4.0
with:
bin: "hxlspec"
args: "--help"
- name: "hxltag --help"
uses: fititnt/hxltm-action@v0.4.0
with:
bin: "hxltag"
args: "--help"
- name: "hxldedup --help"
uses: fititnt/hxltm-action@v0.4.0
with:
bin: "hxldedup"
args: "--help"
### Full list (as 2021-11-07)
# compgen -c | grep hxl
# hxlreplace
# hxlexplode
# hxlselect
# hxladd
# hxlspec
# hxlcount
# hxltag
# hxlcut
# hxlsort
# hxlexpand
# hxlmerge
# hxldedup
# hxlfill
# hxlrename
# hxlclean
# hxlappend
# hxlimplode
# hxlvalidate
# hxlhash
- Syntactic sugar for HXLTM:
--agendum-linguam
) - Concept:
- 'Working language' on Wikipedia
- More context here: #3 (comment)
List of one or more working languages
Note on language options.
Use new lines or ,
as separator.
Parameter examples:
- TODO: add example parameters for IATE and UN working languages here
Opposite of working_languages Note on language options.
- Syntactic sugar for HXLTM:
--auxilium-linguam
) - Concept:
- 'Auxiliary language' on Wikipedia
- More context here: #3 (comment)
List of one or more auxiliary languages (order ir important)
Note on language options.
Use new lines or ,
as separator.
Parameter examples:
- TODO: add example parameters for IATE and UN working languages here
- Syntactic sugar for HXLTM:
--fontem-linguam
) - Concept:
- 'Translation / Source and target languages' on Wikipedia
- More context here: #3 (comment)
Source language Note on language options. Single item.
- Syntactic sugar for HXLTM:
--objectivum-linguam
) - Concept:
- 'Translation / Source and target languages' on Wikipedia
- More context here: #3 (comment)
Target language Note on language options. Single item.
- Syntactic sugar for HXLTM:
--objectivum-formulam
)
Export custom template (HXLTM Ad Hoc Fōrmulam). Path to a single file on local disk.
Parameter examples:
data/README.🗣️.md
- Syntactic sugar for HXLTM:
--objectivum-<VALUE>
)
Export to data standard documented on the HXLTM ontologia.
Parameter examples:
TMX
XLIFF
- Syntactic sugar for HXLTM:
--expertum-HXLTM-ASA <VALUE>
- Concept:
Specify a file to dump the HXLTM Abstractum Syntaxim Arborem [Note on HXLTM-ASA].
Parameter examples:
.asa.hxltm.yml
.asa.hxltm.json
TODO: explain better the outputs.
Piping from stdin and stout, available as an efficient way by underlining cli tools, is not available. If you're working with gigabytes size datasets that would exist on GitHub Actions free disk, consider using actions-python and install all dependencies manually.
The main reason for the hxltm-action documentation on these options to be more conceptual is both because the HXLTM reference implementation tooling allows users specifying them and explose their value for who document custom standards on your ontologia even when original data exchange standards don't use it.
TODO: give even more context
-
See
working_languages
,non_working_languages
,auxiliary_languages
,source_language
,target_language
,dump_abstract_syntax_tree
TODO: explain what is special about the way the reference implementation of HXLTM use HXLTM-ASA.
- Even if the
@v0.*.*
already are usable (but recommended to users to specify exact version), eventually release a@v1.0.0
fo uses can use the convention of GitHub actions of define@v1
/@v2
/@v3
(...) as their version. - Potential new Action Translate Toolkit
- Do https://github.com/translate/translate have a GitHub Action? If not, we may be interested, since they can be used to create PO files and some other bilingual formats.
To the extent possible under law, Emerson Rocha and non anonymous collaborators have waived all copyright and related or neighboring rights to this work to Public Domain.
Optionally, the BSD Zero Clause License is also one explicit alternative to the Unlicense as an older license approved by the OSI:
SPDX-License-Identifier: Unlicense OR 0BSD