Skip to content

[non-production-ready] Multilingual Terminology in Humanitarian Language Exchange. TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, (...)

License

Notifications You must be signed in to change notification settings

fititnt/hxltm-action

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Repository files navigation

Actions with HXLTM: terminology, translation & localization

[non-production-ready] GitHub Action for HXLTM (Multilingual Terminology in Humanitarian Language Exchange). TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, and more.

Preface

Click to see more context details.

What is HXLTM? Referece tooling? The HXLTM Action?

What is HXLTM?

The HXLTM documented convetions (ontologia) explains how store terminology and translation memories in HXL. This make both very compact storage while viable to alow human colaborative editing for complex cases even without advanced frontends.

Referece tooling

Public domain reference tooling enable direct convertion from HXLTM to both templated files (in short: more-than-string-replace placeholders with content from HXLTM) and both user customizable and industry standards related to linguistic content.

The HXLTM Action

This GitHub Action abstract part of what is possible use with underling HXLTM cli tools. This action also allow use the fantastic command line tools shipped with libhxl-python configurable with the bin parameter.

Source code for underlining applications:


Table of Contents


Example usage

Are you new to GitHub Actions? PROTIP!

PROTIP: if you are new to GitHub Actions consider each published action with 💖 by with others (TL;DR the - uses: of - uses: actions/checkout@v2 part) as building blocks who run (TL;DR the runs-on: ubuntu-latest part) on 8GB to 14GB RAM powerful virtual machines and are 100% free and unlimited (*) to public open source projects.

(*): but even in good intent, avoid too often unauthenticated request for external services without strong reason, like Google Sheets. Special care with Scheduled jobs for datasets someone else already is sharing a cached version and hosting on GitHub Pages or some other site.

Quickstart

on: [push]

jobs:
  HXLTM-export:
    name: Converts HXLTM to multilingual data formats
    runs-on: ubuntu-latest
    steps:

      - name: Checkout the git repository to the actions temporary host runner
        uses: actions/checkout@v2

      - name: "HXLTM to TBX (TermBase eXchange)"
        uses: actions/hxltm-action@v0.4.0
        with:
            bin: 'hxltmcli'
            # https://hdp.etica.ai/hxltm/archivum/#TBX-Basim
            args: "--objectivum-TBX-Basim"
            infile: 'fontem.tm.hxl.csv'
            outfile: 'objectivum.tbx'

      - name: "HXLTM to TMX (Translation Memory eXchange)"
        uses: actions/hxltm-action@v0.4.0
        with:
            bin: 'hxltmcli'
            args: "--objectivum-TMX"
            infile: 'fontem.tm.hxl.csv'
            outfile: 'objectivum.tmx'

      - name: "HXLTM to UTX (Universal Terminology eXchange)"
        uses: actions/hxltm-action@v0.4.0
        with:
            bin: 'hxltmcli'
            args: "--objectivum-UTX"
            infile: 'fontem.tm.hxl.csv'
            outfile: 'objectivum.utx'

Full example usages

Examples of repositories using this action

hxltm-action-example

The hxltm-action-example is used to test the lasted version of hxltm-action. It's recommended to specify a version (or a strict hash), like @v0.4.0 instead of @main, so - uses: fititnt/hxltm-action@main would become - uses: fititnt/hxltm-action@v0.4.0.

Documentation

This documentation explains the action.yml and entrypoint.sh strategy to abstract the command line usage described at https://hdp.etica.ai/hxltm/archivum/.

Baseline inputs

Baseline inputs, together with Environment variables, are enough to abstract how to use the underlying command line tools. The syntactic sugar inputs offer some level of abstraction.


      # TODO: explain this snipped a bit better
      - # name: "Some description here"
        uses: fititnt/hxltm-action@v0.4.0
        with:
          bin: "hxltmcli" # hxltmcli, hxltmdexml
          args: ""  # 
          infile: path/to/fontem.tm.hxl.csv
          outfile: path/to/objectivum

bin

Required The executable to run.

Parameter examples:

  • hxltmcli (or .github/hxltm/hxltmcli.py) (*)
  • hxltmdexml (or .github/hxltm/hxltmdexml.py) (*)

(*): If necessary, a local customized fork of the reference HXLTM tools can be stored near where the data is processed. The suggested places are .github/hxltm/(file).py. This can both be useful for testing proposes or immediate hotfixes under urgency response where you as implementer cannot wait.

args

Arguments passed for the program defined by bin parameter.

Parameter examples:

  • --help
  • -v
  • --sheet 7 (Select sheet from a Excel workbook (1 is first sheet))

infile

The input file for the program defined by bin parameter Note on non use of pipelines. Default "fontem.ext".

Parameter examples:

  • fontem.hxl.csv
  • fontem.tbx

outfile

The output file for the program defined by bin parameter Note on non use of pipelines. Default "objectivum.ext".

Parameter examples:

  • objectivum.tbx
  • objecricum.hxl.csv

Environment variables

Reusable defaults

The way GitHub Actions steps works, environment variables can be both passed at the entire job level or at specific tasks. One implication of action.yml and entrypoint.sh is that the use of environment variables at job level can be used to create default values for potentially repetitive values, like working_languages.

TODO: test this potential implication and document it.

Syntactic sugar inputs

This section shows some syntactic sugar (or intentional syntactic saccharins) for what could be done using other ways, often with args parameter. Some of these only use English for what hxltm cli tools use Latin.

help

A syntax sugar to evoke bin program with --help and exit without raising error. Default false.

Just copy and paste the following.

      - name: "hxltmcli --help"
        uses: fititnt/hxltm-action@v0.4.0
        with:
          bin: "hxltmcli"
          args: "--help"

      - name: "hxltmdexml --help"
        uses: fititnt/hxltm-action@v0.4.0
        with:
          bin: "hxltmdexml"
          args: "--help"
Extra: HXLStandard cli tools

Since libhxl-python is a requeriment of hxltm, you can reuse this action to pre-process already HXLated datasets (if not HXLated yet, use hxltag and manually map.)

      # Bonus: HXLStandard cli tools ___________________________________________
      # @see https://github.com/HXLStandard/libhxl-python/wiki/Command-line-tools
      - name: "hxlspec --help"
        uses: fititnt/hxltm-action@v0.4.0
        with:
          bin: "hxlspec"
          args: "--help"

      - name: "hxltag --help"
        uses: fititnt/hxltm-action@v0.4.0
        with:
          bin: "hxltag"
          args: "--help"

      - name: "hxldedup --help"
        uses: fititnt/hxltm-action@v0.4.0
        with:
          bin: "hxldedup"
          args: "--help"

      ### Full list (as 2021-11-07)
      # compgen -c | grep hxl
      # hxlreplace
      # hxlexplode
      # hxlselect
      # hxladd
      # hxlspec
      # hxlcount
      # hxltag
      # hxlcut
      # hxlsort
      # hxlexpand
      # hxlmerge
      # hxldedup
      # hxlfill
      # hxlrename
      # hxlclean
      # hxlappend
      # hxlimplode
      # hxlvalidate
      # hxlhash

working_languages

List of one or more working languages Note on language options. Use new lines or , as separator.

Parameter examples:

  • TODO: add example parameters for IATE and UN working languages here

non_working_languages

Opposite of working_languages Note on language options.

auxiliary_languages

List of one or more auxiliary languages (order ir important) Note on language options. Use new lines or , as separator.

Parameter examples:

  • TODO: add example parameters for IATE and UN working languages here

source_language

Source language Note on language options. Single item.

target_language

Target language Note on language options. Single item.

export_ad_hoc_template

  • Syntactic sugar for HXLTM: --objectivum-formulam)

Export custom template (HXLTM Ad Hoc Fōrmulam). Path to a single file on local disk.

Parameter examples:

  • data/README.🗣️.md

export_data_exchange_standard

  • Syntactic sugar for HXLTM: --objectivum-<VALUE>)

Export to data standard documented on the HXLTM ontologia.

Parameter examples:

  • TMX
  • XLIFF

dump_abstract_syntax_tree

Specify a file to dump the HXLTM Abstractum Syntaxim Arborem [Note on HXLTM-ASA].

Parameter examples:

  • .asa.hxltm.yml
  • .asa.hxltm.json

Outputs

resultatum

TODO: explain better the outputs.

Annotations

Note on non use of pipelines

Piping from stdin and stout, available as an efficient way by underlining cli tools, is not available. If you're working with gigabytes size datasets that would exist on GitHub Actions free disk, consider using actions-python and install all dependencies manually.

Note on language options

The main reason for the hxltm-action documentation on these options to be more conceptual is both because the HXLTM reference implementation tooling allows users specifying them and explose their value for who document custom standards on your ontologia even when original data exchange standards don't use it.

TODO: give even more context

Note on HXLTM-ASA

TODO: explain what is special about the way the reference implementation of HXLTM use HXLTM-ASA.

To do

  • Even if the @v0.*.* already are usable (but recommended to users to specify exact version), eventually release a @v1.0.0 fo uses can use the convention of GitHub actions of define @v1 / @v2 / @v3 (...) as their version.
  • Potential new Action Translate Toolkit

License

Public Domain

To the extent possible under law, Emerson Rocha and non anonymous collaborators have waived all copyright and related or neighboring rights to this work to Public Domain.

Optionally, the BSD Zero Clause License is also one explicit alternative to the Unlicense as an older license approved by the OSI:

SPDX-License-Identifier: Unlicense OR 0BSD

About

[non-production-ready] Multilingual Terminology in Humanitarian Language Exchange. TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, (...)

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published