Skip to content

A quickstart full serverless datalake IaC built upon S3 on AWS with best practices applied

License

Notifications You must be signed in to change notification settings

KranioIO/quickstart-aws-datalake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quickstart aws datalake

A quickstart project to deploy a serverless datalake over AWS Cloud using S3 Service, all built using Infrastructure as Code (IaC) with best practices applied for data governance.

Getting Started


This project deploys 4 buckets on s3, one for logs and 3 datalake tiers (ingestion -> processing -> consumption), all datalake buckets with:

  • server side encryption
  • file versioning
  • lifecycle rules
  • action log track

Deploy


The first important step is change the service name over serverless.yml file since the names over s3 service are global and can not repeat, By default the project is deployed as dev stage, but you can set with --stage, example serverless deploy --stage qa.

service: YOUR-ORGANIZATION-NAME-datalake

custom:
  tags: ${file(tags.yml):${self:provider.stage}}
  prefix: ${self:service}-${self:provider.stage}

...

Obs: S3 only allow lower case names, so your organization name should follow this rule, or you can use this plugin for serverless framework utils funciontions like fn::lower and avoid this limitation.

The second step is setup adm accounts for KMS encryption at the datalake-kms-administrators folder for the desired stage.

To deploy the project you need an AWS account, a CLI environment configured with AWS-CLI and Serverless configured, execute aws configure to setup your credentials.

To deploy your datalake just run: export AWS_PROFILE="your-account-profile" && serverless deploy

About

A quickstart full serverless datalake IaC built upon S3 on AWS with best practices applied

Topics

Resources

License

Stars

Watchers

Forks