Skip to content

This is a pipeline for Covid data ingestion and integration

Notifications You must be signed in to change notification settings

toema/G-covid-pipeline

Repository files navigation

trino-minio-docker

Minimal example to run Trino with Minio and the Hive standalone metastore on Docker. The data used in this tutorial is from kaggle .

image

Installation and Setup

Install s3cmd with:

sudo apt update
sudo apt install -y \
    s3cmd \
    openjdk-11-jre-headless  # Needed for trino-cli

Pull and run all services with:

docker-compose up

Configure s3cmd with (or use the minio.s3cfg configuration):

s3cmd --config minio.s3cfg --configure

Use the following configuration for the s3cmd configuration when prompted:

Access Key: minio_access_key
Secret Key: minio_secret_key
Default Region [US]:
S3 Endpoint [s3.amazonaws.com]: localhost:9000
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: localhost:9000
Encryption password:
Path to GPG program [/usr/bin/gpg]:
Use HTTPS protocol [Yes]: no

To create a bucket and upload data to minio, type:

s3cmd --config minio.s3cfg mb s3://iris
s3cmd --config minio.s3cfg put data/iris.parq s3://iris

To list all object in all buckets, type:

s3cmd --config minio.s3cfg la

Access Trino with CLI and Prepare Table

Download trino cli with:

wget https://repo1.maven.org/maven2/io/trino/trino-cli/352/trino-cli-351-executable.jar \
  -O trino
chmod +x trino  # Make it executable

Create schema and create table with:

./trino --execute "
CREATE SCHEMA IF NOT EXISTS minio.c19
WITH (location = 's3a://c19/');

CREATE TABLE IF NOT EXISTS minio.c19.country_wise_latest (
  Country_or_Region VARCHAR,
  Confirmed VARCHAR,
  Deaths VARCHAR,
  Recovered VARCHAR,
  Active VARCHAR,
  New_cases VARCHAR,
  New_deaths VARCHAR,
  New_recovered VARCHAR,
  Deaths_Per_100_Cases VARCHAR,
Recovered_Per_100_Cases VARCHAR,
Deaths_Per_100_Recovered VARCHAR,
Confirmed_last_week VARCHAR,
_1_week_change VARCHAR,
_1_week_Per_increase VARCHAR,
WHO_Region VARCHAR
)
WITH (
  external_location = 's3a://c19/',
  format = 'CSV'
);
"

Query the newly created table with:

./trino --execute "
SHOW TABLES IN minio.c19;
SELECT * FROM minio.c19.country_wise_latest LIMIT 5;"

Use PowerBI trino connector to connect trino SQL engine to powerBI

License

This project is licensed under the MIT license. See the LICENSE for details.

About

This is a pipeline for Covid data ingestion and integration

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published