texd is a TeXaS (TeX as (a) service) solution, designed for your internal document generation, i.e. on your own servers.
It features:
- ad-hoc compilation,
- a simple HTTP API access,
- pluggable TeX distributions,
- ...
The idea behind texd is to provide a single, network-reachable compiler; sending a .tex
file via
HTTP POST should then simply generate a PDF document. You won't need a TeX distribution on the
client system, just an HTTP client.
Several technologies make scaling in any dimension relatively easy:
- the TeX distribution is provided through Docker containers (this also allows using multiple distributions simultaneously)
- using HTTP enables redundancy and/or load balancing without much effort
texd is designed to be run/deployed in 2½ different ways:
This is primarily for (local) testing and development. You download and start texd locally, provide a TeX distribution, and texd will compile documents on your host machine.
To start texd in this mode, execute:
$ texd
Here, you still download and run texd locally, but document rendering will happen in an short-lived
Docker container, using a specific Docker image (registry.gitlab.com/islandoftex/images/texlive:latest
will do just fine, but you could easily build a smaller one using e.g. a Debian base image).
To run in container mode, run:
$ texd registry.gitlab.com/islandoftex/images/texlive:latest
This will pull the specified image, if it doesn't exist yet. Note that you need to give texd
access to /var/run/docker.sock
, in order to allow it to pull the image and create containers.
You may provide multiple image names and switch on a per-request basis (see HTTP API below). In this case, the first image is used as default image:
$ texd \
registry.gitlab.com/islandoftex/images/texlive:latest \
registry.gitlab.com/islandoftex/images/texlive:TL2014-historic \
ghcr.io/yourcompany/texlive-prod
This runs texd within a Docker container, and is primarily targeted for CI pipelines, but can be a viable alternative to the local mode. In fact, this mode is functionally equivalent to the local mode, with the one exception (texd being packaged and started in a container).
To run texd as Docker service, use this command:
$ docker run --rm -t -p localhost:2201:2201 ghcr.io/digineo/texd:latest
The image ghcr.io/digineo/texd:latest
is based on Debian Bookworm with
some texlive packages installed from the Debian repositories (see this
Dockerfile
for the current list). This also
means that the contained TeX distribution is TeXlive 2022.
Note:
If you want/need to run the container with non-root user ID (e.g. when started with sth. like
docker run --user=$(id -un):$(id -gn)
), make sure to also setup aHOME
directory. Otherwise you'll likely encounter FontConfig caching errors.Example
$ mkdir -p texd/{home,jobs} $ docker run --rm -t \ -p localhost:2201:2201 \ --user $(id -un):$(id -gn) \ -e HOME=/texd/home \ -v $(pwd)/texd/home:/texd/home \ -v $(pwd)/texd/jobs:/texd/jobs \ ghcr.io/digineo/texd:latest \ --job-directory /texd/jobs
When using Gitlab CI, you can add this snippet to your .gitlab-ci.yml
:
services:
- name: ghcr.io/digineo/texd:latest
alias: texd
variables:
# reconfigure test application to use this endpoint
# (this is specific to your application!)
TEXD_ENDPOINT: http://texd:2201/render
Calling texd with options works in any mode; these commands are equivalent:
$ texd -h
$ texd tregistry.gitlab.com/islandoftex/images/texlive:latestest -h
$ docker run --rm -t ghcr.io/digineo/texd:latest -h
-
--help
,-h
Prints a short option listing and exits.
-
--version
,-v
Prints version information and exits.
-
--listen-address=ADDR
,-b ADDR
(Default::2201
)Specifies host address (optional) and port number for the HTTP API to bind to. Valid values are, among others:
:2201
(bind to all addresses on port 2201)localhost:2201
(bind only to localhost on port 2201)[fe80::dead:c0ff:fe42:beef%eth0]:2201
(bind to a link-local IPv6 address on a specific interface)
-
--tex-engine=ENGINE
,-X ENGINE
(Default:xelatex
)TeX engine used to compile documents. Can be overridden on a per-request basis (see HTTP API below). Supported engines are
xelatex
,lualatex
, andpdflatex
. -
--compile-timeout=DURATION
,-t DURATION
(Default:1m
)Maximum duration for a document rendering process before it is killed by texd. The value must be acceptable by Go's
ParseDuruation
function. -
--parallel-jobs=NUM
,-P NUM
(Default: number of cores)Concurrency level. PDF rendering is inherently single threaded, so limiting the document processing to the number of cores is a good start.
-
--queue-wait=DURATION
,-w DURATION
(Default:10s
)Time to wait in queue before aborting. When <= 0, clients will immediately receive a "full queue" response.
-
--job-directory=PATH
,-D PATH
(Default: OS temp directory)Place to put job sub directories in. The path must exist and it must be writable.
-
--pull
(Default: omitted)Always pulls Docker images. By default, images are only pulled when they don't exist locally.
This has no effect when no image tags are given to the command line.
Note: This option listing might be outdated. Run
texd --help
to get the up-to-date listing.
To create a PDF document from an input .tex
file, send a HTTP POST to the /render
endpoint.
You may encode the payload as multipart/form-data
or application/x-www-form-encoded
, however
the latter is not recommended.
Assuming, you have a input.tex
in the current directory, you can issue the following command
to send that file to your texd instance, and save the result in a file named output.pdf
:
$ curl -X POST \
-F "input.tex=<input.tex" \
-o "output.pdf" \
"http://localhost:2201/render"
You can send multiple files (even in sub directories) as well:
$ curl -X POST \
-F "cv.tex=<cv.tex" \
-F "chapters/introduction.tex=<chapters/introduction.tex" \
-F "logo.pdf=<logo.pdf" \
-o "vita.pdf" \
"http://localhost:2201/render?input=cv.tex"
When sending multiple files, you should specify which one is the main input file (usually the one
containing \documentclass
), using the input=
query parameter. If you omit this parameter, texd
will try to guess the input file.
Please note that file names will be normalized, and files pointing outside the root directory
will be discarded entirely (i.e. ../input.tex
is NOT a valid file name). You can't do this:
$ curl -X POST \
-F "../input.tex=<input.tex" \
-o "output.pdf" \
"http://localhost:2201/render"
However, this is perfectly fine:
$ curl -X POST \
-F "input.tex=<../input.tex" \
-o "output.pdf" \
"http://localhost:2201/render"
Guessing the input file (click to show details)
- only filenames starting with alphanumeric character and ending in
.tex
are considered (foo.tex
,00-intro.tex
will be considered, but not_appendix.tex
,figure.png
) - files in sub directories are ignored (e.g.
chapters/a.tex
) - if only one file in the root directory remains, it is taken as main input
- otherwise search for a file containing a line starting with:
- either
%!texd
at the beginning of the file - or
\documentclass
somewhere in the first KiB
- either
- if no match, consider (in order):
input.tex
main.tex
document.tex
- otherwise search for a file containing a line starting with:
If no main input file can be determined, texd will abort with an error.
-
input=<filename>
- instructs texd to skip guessing main input file and use the specified one. The filename must be present in the body. -
engine=<value>
- specifies which TeX engine to run. Supported engines are:xelatex
(default)lualatex
pdflatex
Note that the default can be changed with a CLI option (e.g.
--tex-engine=lualatex
). -
image=<imagename>
- selects Docker image for document processing.This is only available in ephemeral container mode. The image name must match the ones listed in the texd command invocation, i.e. you can't select arbitrary images.
If you provide an unknown image name, you will receive a 404 Not Found response. In local and CI service mode, this parameter only logged, but will otherwise be ignored.
-
errors=<detail level>
- tries to retrieve the compilation log, in case of compilation errors. Acceptable detail levels are:- empty (or
errors
completely absent), to return a JSON description (default) condensed
, to return only the TeX error message from the log filefull
, to return the full log file astext/plain
response
The "condensed" form extracts only the lines from the error log which start with a
!
. Due to the way TeX works, these lines may not paint the full picture, as TeX's log lines generally don't exceed a certain line length, and wrapped lines won't get another!
prefix.Note that this parameter changes the response content to a plain text file if you select
full
orcondensed
, and not a JSON response as in all other cases. - empty (or
If compilation succeeds, you'll receive a status 200 OK, with content type application/pdf
, and
the PDF file as response body.
HTTP/1.1 200 OK
Content-Type: application/pdf
Content-Length: 1234
%PDF/1.5...
If the request was accepted, but could not complete due to errors, you will by default receive a 422
Unprocessable Entity response with content type application/json
, and an error description in
JSON format:
HTTP/1.1 422 Unprocessable Entity
Content-Type: application/json
Content-Length: 154
{
"error": "latexmk call failed with status 1",
"category": "compilation",
"output": "[truncated output log]"
}
The fields error
and category
represent a short error description and an error category,
respectively.
Possible, known error categories are currently:
-
input - one or more files are invalid (e.g. file was discarded after path normalization), or the main input file could not be determined.
-
compilation -
latexmk
exited with an error (likely due to invalid or missing input files). -
queue - texd won't accept new render jobs, if its internal queue is at capacity. In this case wait for a few moments to give texd a chance to catch up and then try again.
-
reference - texd could not find the provided reference store entries. The missing references are listed in the response; you need to repeat the request with those files included.
Additional fields, like log
for compilation failures, might be present.
Note: The JSON response is pretty-printed only for this README. Expect the actual response to be minified.
If you set errors=full
, you may receive a plain text file with the compilation log:
Show response (click to open)
HTTP/1.1 422 Unprocessable Entity
Content-Type: text/plain
Content-Length: 3156
This is XeTeX, Version 3.141592653-2.6-0.999993 (TeX Live 2021) (preloaded format=xelatex 2022.3.6) 12 MAR 2022 13:57
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
... omitting some lines ...
! LaTeX Error: File `missing.tex' not found.
Type X to quit or <RETURN> to proceed,
or enter new name. (Default extension: tex)
Enter file name:
! Emergency stop.
<read *>
l.3 \input{missing.tex}
^^M
*** (cannot \read from terminal in nonstop modes)
For errors=condensed
, you'll only receive the lines starting with !
(with this prefix removed):
Show response (click to open)
HTTP/1.1 422 Unprocessable Entity
Content-Type: text/plain
Content-Length: 59
LaTeX Error: File `missing.tex' not found.
Emergency stop.
texd has a number of runtime configuration knobs and internal state variables, which may or may not
of interest for API consumers. To receive a current snapshot, query /status
:
$ curl -i http://localhost:2201/status
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Length: 287
{
"version": "0.0.0",
"mode": "container",
"images": ["registry.gitlab.com/islandoftex/images/texlive:latest"],
"timeout": 60,
"engines": ["xelatex","pdflatex","lualatex"],
"default_engine": "xelatex",
"queue": {
"length": 0,
"capacity": 16
}
}
For monitoring, texd provides a Prometheus endpoint at /metrics
:
$ curl -i http://localhost:2201/metrics
Content-Type: text/plain; version=0.0.4; charset=utf-8
...
The metrics include Go runtime information, as well as texd specific metrics:
Metric name | Type | Description |
---|---|---|
texd_processed_total{status="success"} |
counter | Number of documents processed. |
texd_processed_total{status="failure"} |
counter | Number of rendering errors, including timeouts. |
texd_processed_total{status="rejected"} |
counter | Number of rejected requests, due to full job queue. |
texd_processed_total{status="aborted"} |
counter | Number of aborted requests, usually due to timeouts. |
texd_processing_duration_seconds |
histogram | Overview of processing time per document. |
texd_input_file_size_bytes{type=?} |
histogram | Overview of input file sizes. Type is either "tex" (for .tex, .cls, .sty, and similar files), "asset" (for images and fonts), "data" (for CSV files), or "other" (for unknown files) |
texd_output_file_size_bytes |
histogram | Overview of output file sizes. |
texd_job_queue_length |
gauge | Length of rendering queue, i.e. how many documents are waiting for processing. |
texd_job_queue_usage_ratio |
gauge | Queue capacity indicator (0.0 = empty, 1.0 = full). |
texd_info{version="0.0.0", mode="local", ...} |
constant | Various version and configuration information. |
Metrics related to processing also have an engine=?
label indicating the TeX engine ("xelatex",
"lualatex", or "pdflatex"), and an image=?
label indicating the Docker image.
You can try compiling TeX documents directly in your browser: Visit http://localhost:2201, and you'll be greeted with a very basic, but functional UI.
Please note, that this UI is not built to work in every browser. It intentionally does not use fancy build tools. It's just a simple HTML file, built by hand, using Bootstrap 5 for aesthetics and Vue 3 for interaction. Both Bootstrap and Vue are bundled with texd, so you won't need internet access for this to work.
If your browser does not support modern features like ES2022 proxies, Object.entries
, fetch
,
and <object type="application/pdf" />
elements, you're out of luck. (Maybe upgrade your browser?)
Anyway, consider the UI only as demonstrator for the API.
texd has the ability to reuse previously sent material. This allows you to reduce the amount of data you need to transmit with each render request. Following a back-of-the-envelope calculation:
- If you want to generate 1000 documents, each including a font with 400 kB in size, and a logo file with 100 kB in size, you will need to transmit 500 MB of the same two files in total.
- If you can reuse those two assets, you would only need to transmit them once, and use a reference hash for each subsequent request. The total then reduces 1×500 kB (complete assets for the first request) + 999×100 Byte (50 Byte per reference hash for subsequent requests) = 599.9 kB.
The feature in texd parlance is called "reference store", and you may think of it as a cache. It saves files server-side (e.g. on disk) and retrieves them on-demand, if you request such a file reference.
A reference hash is simply the Base64-encoded SHA256 checksum of the file contents, prefixed with "sha256:". (Canonically, we use the URL-safe alphabet without padding for the Base64 encoder, but texd also accepts the standard alphabet, and padding characters are ignored in both cases.)
To use a file reference, you need to set a special content type in the request, and include the
reference hash instead of the file contents. The content type must be application/x.texd; ref=use
.
The resulting HTTP request should then look something like this:
POST /render HTTP/1.1
Content-Type: multipart/form-data; boundary=boundary
--boundary
Content-Disposition: form-data; name=input.tex; filename=input.tex
Content-Type: application/octet-stream
[content of input.tex omitted]
--boundary
Content-Disposition: form-data; name=logo.pdf; filename=logo.pdf
Content-Type: application/x.texd; ref=use
sha256:p5w-x0VQUh2kXyYbbv1ubkc-oZ0z7aZYNjSKVVzaZuo=
--boundary--
For unknown reference hashes, texd will respond with an error, and list all unknown references:
HTTP/1.1 422 Unprocessable Entity
Content-Type: application/json
{
"category": "reference",
"error": "unknown file references",
"reference": [
"sha256:p5w-x0VQUh2kXyYbbv1ubkc-oZ0z7aZYNjSKVVzaZuo="
]
}
In such a case, you can repeat you HTTP request, and change the ref=use
to ref=store
for
matching documents:
POST /render HTTP/1.1
Content-Type: multipart/form-data; boundary=boundary
--boundary
Content-Disposition: form-data; name=input.tex; filename=input.tex
Content-Type: application/octet-stream
[content of input.tex omitted]
--boundary
Content-Disposition: form-data; name=logo.pdf; filename=logo.pdf
Content-Type: application/x.texd; ref=store
[content of logo.pdf omitted]
--boundary--
By default, the reference store is not enabled. You must enable it explicitly, by providing
a command line flag. Assuming you have a local directory ./refs
, you instruct texd to use
this directory for references:
$ texd --reference-store=dir://./refs
The actual syntax is --reference-store=DSN
, where storage adapters are identified through and
configured with a DSN (data source name, a URL). Currently there are only handful implementations:
-
The
dir://
adapter (docs), which stores reference files on disk in a specified directory. Coincidentally, this adapter also provides an in-memory adapter (memory://
), courtesy of the spf13/afero package. -
The
memcached://
adapter (docs), which stores, you may have guessed it, reference files in a Memcached instance or cluster. -
The
nop://
adapter (docs), which―for the sake of completeness sake―implements a no-op store (i.e. attempts to store reference file into is, or load files from it fail silently). This adapter is used as fallback if you don't configure any other adapter.
It is not unfeasible to imagine further adapters being available in the future, such as additional
key/value stores (redis://
), object storages (s3://
, minio://
), or even RDBMS (postgresql://
,
mariadb://
).
texd supports three different retention policies:
keep
(ornone
) will keep all file references forever. This is the default setting.purge-on-start
(or justpurge
) will delete file references once on startup.access
will keep an access list with LRU semantics, and delete file references, either if a max. number of items is reached, or if the total size of items exceeds a threshold, or both.
To select a specific retention policy, use the --retention-policy
CLI flag:
$ texd --reference-store=dir://./refs --retention-policy=purge
To configure the access list (--retention-policy=access
), you can adopt the quota to your needs:
$ texd --reference-store=dir://./refs \
--retention-policy=access \
--rp-access-items=1000 \
--rp-access-size=100MB
Notes:
- The default quota for the max. number of items (
--rp-access-items
) is 1000. - The default quota for the max. total file size (
--rp-access-size
) is 100MB. - Total file size is measured in bytes, common suffixes (100KB, 2MiB, 1.3GB) work as expected.
- To disable either limit, set the value to 0 (e.g.
--rp-access-items=0
). - It is an error to disable both limits (in this case just use
--retention-policy=keep
). - Currently, only the
dir://
(andmemory://
) adapter support a retention policy; thememcached://
adapter delegates this responsibility to the Memcached server.
texd came to life because I've build dozens of Rails applications, which all needed to build PDF documents in one form or another (from recipes, to invoices, order confirmations, reports and technical documentation). Each server basically needed a local TeX installation (weighing in at several 100 MB, up to several GB). Compiling many LaTeX documents also became a bottleneck for applications running on otherwise modest hardware (or cloud VMs), as this process is also computationally expensive.
Over time I've considered using alternatives for PDF generation (Prawn, HexaPDF, gofpdf, SILE, iText PDF, to name but a few), and found that the quality of the rendered PDF is far inferior to the ones generated by LaTeX. Other times, the licensing costs are astronomical, or the library doesn't support some layouting feature, or the library in an early alpha stage or already abandoned...
I'll admit that writing TeX templates for commercial settings is a special kind of pain-inducing form of art. But looking back at using LaTeX for now over a decade, I still feel it's worth it.
One wishlist's item is asynchronous rendering: Consider rendering monthly invoices on the first of each month; depending on the amount of customers/contracts/invoice positions, this can easily mean you need to render a few thousand PDF documents.
Usually, the PDF generation is not time critical, i.e. they should finish in a reasonable amount of time (say, within the next 6h to ensure timely delivery to the customer via email). For this to work, the client could provide a callback URL to which texd sends the PDF via HTTP POST when the rendering is finished.
Of course, this will also increase complexity on both sides: The client must be network-reachable itself, an keep track of rendering request in order to associate the PDF to the correct invoice; texd on the other hand would need a priority queue (processing async documents only if no sync documents are enqueued), and it would need to store the callback URL somewhere.
Of course, this project was not created in a void, other solutions exist as well:
-
latexcgi, MIT license, GitHub project, Website
Project description:
The TeXLive.net server (formally known as (LaTeX CGI server) (currently running at texlive.net) accepts LaTeX documents via an HTTP POST request and returns a PDF document or log file in the case of error.
It is written as a perl script accepting the post requests via cgi-bin access in an apache HTTP server.
-
Overleaf, AGPL-3.0 license, GitHub project, Website
Project description:
Overleaf is an open-source online real-time collaborative LaTeX editor. We run a hosted version at www.overleaf.com, but you can also run your own local version, and contribute to the development of Overleaf.
-
overleaf/clsi, AGPL-3.0 license, GitHub project
Project description:
A web api for compiling LaTeX documents in the cloud
The Common LaTeX Service Interface (CLSI) provides a RESTful interface to traditional LaTeX tools (or, more generally, any command line tool for composing marked-up documents into a display format such as PDF or HTML).
Please report bugs and feature request to https://github.com/digineo/texd/issues.
Pull requests are welcome, even minor ones for typo fixes. Before you start on a larger feature, please create a proposal (in form of an issue) first.
MIT, © 2022, Dominik Menke, see file LICENSE