- Introduction
- Architecture
- Prerequisites
- Tools and services
- Usage
- Cost Estimate
- Clean up
- Reference
- Security
- License
This project provides an example using Amazon Macie to discovery sensitive financial data stored in an Amazon S3 bucket. The S3 object data will be automatically tagged and an S3 bucket lifecycle policy will be applied to transition the objects into Amazon Glacier.
- An Amazon Macie job is used to scan an [Amazon S3] bucket for objects containing sensitive financial information (credit card numbers, account numbers, etc)
- An Amazon EventBridge rule is used to capture the Amazon Macie findings.
- Amazon EventBridge then sends the findings into into an Amazon Kinesis Data Firehose.
- The Amazon Kinesis Data Firehose is used to batch the findings and aggregate them into an Amazon S3 results bucket.
- An Amazon S3 event notification is used to trigger an AWS Lambda function when new results are found in the bucket.
- The AWS Lambda function will add the Macie finding severity to the S3 object as a new tag. The function will also update the bucket lifecycle policy to automatically transition the object to Amazon Glacier a configurable number of days.
- Python 3, installed
- AWS Command Line Interface (AWS CLI) version 2, installed
- AWS Serverless Application Model (SAM), installed
- Docker Desktop, installed
- AWS SAM - The AWS Serverless Application Model (SAM) is an open-source framework for building serverless applications. It provides shorthand syntax to express functions, APIs, databases, and event source mappings.
- AWS Lambda - AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes.
- Amazon Macie - Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS.
- Amazon Kinesis Data Firehose - Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services.
- Amazon EventBridge - Amazon EventBridge is a serverless event bus service that you can use to connect your applications with data from a variety of sources.
Parameter | Type | Default | Description |
---|---|---|---|
TagKey | String | Severity | Tag key to use when tagging S3 object finding severity |
SeverityThreshold | String | High | Scoring threshold to tag S3 objects |
SourceBucketName | String | None | Optional S3 bucket containing potentially sensitive content (if not provided, a bucket will be created) |
SourceBucketRetention | Number | 0 | If creating a source bucket, what is the default object retention (in days). Set to zero to disable. |
GlacierTransitionInDays | Number | 365 | Number of days until objects are transitioned to Glacier |
ExpireObjectsInDays | Number | 1825 | Number of days until objects permanently expire |
git clone https://github.com/aws-samples/aws-financial-data-discovery-samples
cd aws-financial-data-discovery-samples
sam build
sam deploy --guided
Please refer to the Amazon Macie Pricing page for details.
Deleting the CloudFormation Stack will remove the Lambda functions, Kinesis Data Firehose and EventBridge rule. Ensure the S3 buckets are empty before attempting to remove them.
This solution is inspired by this original AWS Big Data Blog
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.