The goal of this project is to assess which query engines can realistically run inside cloud functions (in particular AWS Lambda) and have a first feeling about their performances in this highly constrained environment.
We want to provide an accurate and interactive representation of our experimental results. We believe that this is best achieved through open interactive dashboards. This work is still work in progress, feel free to play with it and give us your feedback!
- NYC Taxi Parquet GROUP BY duration of various engines in AWS Lambda
- AWS Lambda scale up duration by payload and function size
The l12n-shell
provides a way to run all commands in an isolated Docker
environement. It is not strictly necessary, but simplifies the collaboration on
the project. To set it up:
- you must have a recent version (v20+) of Docker installed, it is the only dependency
- clone this repository:
git clone https://github.com/cloudfuse-io/lambdatization
- add the l12n-shell to your path (optional)
sudo ln -s $(pwd)/lambdatization/l12n-shell /usr/local/bin/l12n-shell
- run
L12N_BUILD=1 l12n-shell
:- the
L12N_BUILD
environment variable indicates to thel12n-shell
script that it needs to build the image. l12n-shell
operates in the current directory to:- look for a
.env
file to source configurations from (see configuration section below). - stores the terraform state if the local backend is used.
- store the terraform data, i.e the cache data generated by
terraform init
.
- look for a
- the
l12n-shell
without any argument runs an interactive bash terminal in the CLI container. Note that the.env
file is loaded only once when thel12n-shell
is started. 12n-shell cmd
andecho "cmd" | l12n-shell
both runcmd
in thel12n-shell
.
- the
Note:
l12n-shell
only supports amd64 for now- it is actively tested on Linux only
l12n-shell
can be configured through environement variables or a .env
file
in the current directory:
L12N_PLUGINS
is a comma seprated list of plugins to activateL12N_AWS_REGION
is the region where the stack should run
You can also provide the usual AWS variables:
AWS_PROFILE
AWS_SHARED_CREDENTIALS_FILE
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
You might also want to verify you "Concurrent executions" quota for Lambda in your AWS account and ask for an increase if required.
If you want to use Terraform Cloud as a backend instead of local
, set
TF_STATE_BACKEND=cloud
. You should then also configure:
TF_ORGANIZATION
, the name of an existing organization in your Terraform Cloud account.TF_API_TOKEN
, a Terraform Cloud user token.TF_WORKSPACE_PREFIX
, a prefix shared by all workspaces. Should contain only alphanumeric or-
characters (e.gTF_WORKSPACE_PREFIX=l12n-dev-
).- Add the
tfcloud
plugin to theL12N_PLUGINS
list to enable thel12n tfcloud.config
command. This will help you automatically configure the workspaces for all your active plugins with the right settings and credentials.
Note Environment variables will take precedence over the
.env
file
For better analysis of the proxying components, you can setup any observability backend compatible with the OpenTelemetry Protocol (OTLP) over the http protocol. We recommend in particular Grafana Cloud which has a generous Free Tier and a nice interface.
L12N_CHAPPY_OPENTELEMETRY_URL=https://otlp-gateway-{$grafana_region}.grafana.net/otlp/v1/traces
L12N_CHAPPY_OPENTELEMETRY_AUTHORIZATION="Basic {echo -n "$instance_id:$api_key" | base64}"
Where:
grafana_region
is the region of your Grafana Cloud instance, e.gprod-us-east-0
instance_id
can be obtained from the detail page of your Grafana Cloud instanceapi_key
is a Grafana Cloud api key with MetricsPublisher rolebase64(instance_id:api_key)
is the base64 encoding of the two variables above separated by:
You can also try out [Aspecto][https://www.aspecto.io/] which has pretty similar capabilities and a very easy setup.
L12N_CHAPPY_OPENTELEMETRY_URL=https://otelcol.aspecto.io/v1/traces
L12N_CHAPPY_OPENTELEMETRY_AUTHORIZATION=aspecto_key
Inside the l12n-shell
, you can use the following commands:
l12n -h
to see all the available commandsl12n deploy -a
will run the terraform scripts and deploy the necessary resources (buckets, functions, roles...)l12n destroy -a
to tear down the infrastructure and clean up your AWS accountl12n dockerized -e engine_name
runs a preconfigured query in the dockerized version of the specified engine locally. It requires the core module to be deployed to have access to the datal12n run-lambda -e engine_name -c sql_query
runs the specified sql query on the given engine- you can also run pre-configured queries using the examples. Run
l12n -h
to see the list of examples.
- you can also run pre-configured queries using the examples. Run
Infrastructure is managed by Terraform.
We use Terragrunt to:
- DRY the Terraform config
- Manage dependencies between modules and allow a plugin based structure.
We are actively monitoring CDK for Terraform and plan to migrate the infrastructure scripts once the tool becomes sufficiently mature (e.g reaches v1).
- We follow the conventional commits standard with this list of types.
- We use the following linters:
- black for Python
- isort for Python imports
- yamllint
- markdownlint