trino-cicd

Analysis of the Trino project CI workflows.

This repo is an example of using Trino to perform ETL (extract, transform, and load) and generate basic reports. The complete workflow is:

Use a custom connector to read Github's API data and save it into an S3 bucket.
Run some queries to analyze the data stored in the S3 bucket, save results to a file and publish it to Github Pages.

All the of the above is repeatable and executed on a schedule using Github Actions.

Queries could be executed against the tables in the Github connector, but it has a few downsides:

it would perform many API requests, possibly hitting rate limits
range of the data is limited - for example, Github Workflow runs are only available from 2 last months

Data is saved to an S3 bucket, since it's cheap and easy to set-up. Since there's no database server running, there's no maintenance required.

Materialized views are not used, because incremental updates are tricky and different for many tables (Github API endpoints).

Sync

Downloads and extracts a zip with the custom trino-rest connector.
Starts Trino in a container with both this additional connector (plugin) and its configuration mounted as volumes. It also includes configuration for a built-in connector that can read and write to an S3 bucket. The required credentials are passed as environmental variables, populated from Github Secrets.
It runs a series of queries similar to INSERT INTO hive.<table> SELECT * FROM github.<table> to save the data. See the Sync class from trino-rest for more details.

Starts Trino in a container with only the connector required to read data from the S3 bucket.
Executes multiple queries, saving results as a simple text table in a file.
Commits the updated results file to this repository, which triggers publishing it to Github Pages.

Name		Name	Last commit message	Last commit date
Latest commit History 372 Commits
.github/workflows		.github/workflows
bin		bin
catalog		catalog
reports		reports
sql		sql
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md