🐘 Postgres2Parquet AWS Lambda Function

This repository contains Python code for an AWS Lambda function that connects to an Amazon RDS PostgreSQL database, executes a SQL query, and writes the query result to Amazon S3 in Parquet format. The function is designed to be triggered by an AWS Lambda event.

This function can also be configured to run locally and write the Parquet file to a local folder.

📂 Files

lambda_function.py: Contains the Lambda function's code.
query.sql: Contains the SQL query to be executed.
local_lambda_runner.py: Script for running the Lambda function locally.
test_event.json: Sample AWS Lambda event for running the Lambda function locally.
test_lambda_function.py: Unit tests for the Lambda function.
Pipfile: Dependency list for Pipenv.
Pipfile.lock: Dependency lock file for Pipenv.
.github/workflows/deploy-to-lambda.yml: GitHub Actions workflow for automating Lambda function deployment.
.github/workflows/run-tests.yml: GitHub Actions workflow for automating unit tests execution in PRs.

🚀 Deploy Parameters

In the GitHub repository Actions secrets, make sure to configure the following secret variables:

Parameter	Description
AWS_ACCESS_KEY_ID	AWS access key ID.
AWS_SECRET_ACCESS_KEY	AWS secret access key.
ENVIRONMENT	Name of the environment (`production`).
DB_NAME_PROD	Name of the production database.
DB_USER	Database username.
DB_PASSWORD_PROD	Database password for production.
DB_HOST_PROD	Database host.
DB_PORT_PROD	Database port.
FILE_NAME	Name of the output file.
S3_PATH	Storage path for Amazon S3 (production).

These are used by the GitHub Actions deployment workflow to authenticate with AWS and deploy and publish the Lambda function.

⚙️ Setup

To use this Lambda function, follow these steps:

AWS Lambda Configuration:
- Create an AWS Lambda function and configure it to trigger based on your use case.
- Attach an execution role granting database read access and S3 write permissions.
- Attach the pre-built AWS Lambda Layer with Pandas and AWSWrangler
AWS IAM User for Lambda Deployment: Create an AWS IAM user with 3rd Party Access with the following inline policy. Make sure to replace the placeholders with your region, account number, and Lambda function name. Also, make sure to create the user's access key ID and secret access key and add them to your GitHub repository Secrets.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "lambda:UpdateFunctionCode",
                "lambda:UpdateFunctionConfiguration",
                "lambda:GetFunctionConfiguration",
                "lambda:PublishVersion"
            ],
			"Resource": "arn:aws:lambda:{REGION}:{ACCOUNT_NUMBER}:function:{LAMBDA_FUNCTION_NAME}"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "lambda:PublishLayerVersion",
            "Resource": "arn:aws:lambda:*:{ACCOUNT_NUMBER}:layer:*"
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": [
                "lambda:GetLayerVersion",
                "lambda:GetLayerVersionPolicy"
            ],
            "Resource": "arn:aws:lambda:*:{ACCOUNT_NUMBER}:layer:*:*"
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "lambda:ListLayerVersions",
                "lambda:ListLayers"
            ],
            "Resource": "*"
        }
    ]
}

GitHub Repo Secrets: Set required environment variables in your GitHub repository Actions secrets, including database connection details (e.g., DB_NAME_PROD, DB_USER, DB_PASSWORD_PROD, etc.), output file name (FILE_NAME), storage paths (S3_PATH), and AWS Configuration (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY).
SQL Query: Create your SQL query to be executed by the Lambda function and save it in query.sql.
GitHub Actions Deployment: Automate deployment using GitHub Actions. Push changes to the main branch or trigger events to execute the deployment workflow defined in .github/workflows/deploy-to-lambda.yml.

🛠 Usage

Once triggered, the Lambda function connects to the specified PostgreSQL database, executes the SQL query, and writes the result to the chosen storage location (S3 or local). Monitor execution and view errors in the AWS Lambda console.

❌ Error Handling

The Lambda function includes error handling, logging exceptions, and returning a 500 status code with an error message in the response if an error occurs.

📝 License

This project is licensed under the MIT License. See the LICENSE file for details.

🤝 Contributing

You are welcome to contribute to this project by submitting a pull request. If you have any suggestions or problems, please open an issue. Thank you!

💖 Support

Your support keeps this project going!

⭐️ Star: Show your appreciation by giving this project a star.
☕️ Buy Me a Coffee: Contribute by buying a virtual coffee.
💼 Sponsor This Project: Consider sponsoring for ongoing support.

Making a difference, one line of code at a time...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐘 Postgres2Parquet AWS Lambda Function

📂 Files

🚀 Deploy Parameters

⚙️ Setup

🛠 Usage

❌ Error Handling

📝 License

🤝 Contributing

💖 Support

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
lambda_function.py		lambda_function.py
local_lambda_runner.py		local_lambda_runner.py
query.sql		query.sql
test_event.json		test_event.json
test_lambda_function.py		test_lambda_function.py

License

WildTrackAI/AWS_Lambda_Postgres2Parquet

Folders and files

Latest commit

History

Repository files navigation

🐘 Postgres2Parquet AWS Lambda Function

📂 Files

🚀 Deploy Parameters

⚙️ Setup

🛠 Usage

❌ Error Handling

📝 License

🤝 Contributing

💖 Support

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages