Pipeline Cloud Database Connection

This project demonstrates how to connect to a database service called Pipeline Cloud using Python. It includes two approaches: a vanilla connection using pyodbc and a more advanced connection using SQLAlchemy.

To utilise this connection method, you will require a 'headless user' (service account), that is linked with your primary Azure MFA user account. For more details on this process, you can speak with your Bonterra account manager.

Prerequisites

Docker
Docker Compose
Poetry
ODBC Driver 18 for SQL Server

Why Use Poetry?

Poetry is a dependency management tool for Python that simplifies the process of managing project dependencies and virtual environments. Unlike pip, Poetry automatically creates and manages a virtual environment for your project, ensuring that dependencies are isolated and consistent across different environments. This makes it easier to maintain and share your project.

Basic Poetry Commands

Install Poetry: Follow the official installation guide to install Poetry on your system.
Install Dependencies: Run poetry install to install all dependencies specified in pyproject.toml.
Activate Virtual Environment: Use poetry shell to activate the virtual environment.
Run a Script: Use poetry run python <script_name>.py to run a script within the virtual environment.

Environment Variables

To connect to the Pipeline Cloud database, you'll need to create a .env file in the root directory (the same level as .env.example). This file will store the necessary environment variables for your database connection. If you're using a Bonterra-hosted Azure SQL instance, you will receive emails from NoReply_PipelineCloud@everyaction.com containing the server name, database name, and a secret link to fetch your application ID, private/public key, and certificate thumbprint. Store the private key in a file named pipelinecloud_privatenopass.pem in the root directory.

⚠️ Important : If you are self-hosting Pipeline Cloud, these values should be provided by your database administrator / DevOps team. The information provided in this section specifically applies to the Bonterra-hosted scenario.

Key	Sample Value	Description
`DATABASE_SERVER`	`esv30ddbms001.database.windows.net` or `esv30pdbms002.database.windows.net`	The server address of the database. Verify the correct server in your Azure MFA user connection string (e.g., via DBeaver, Azure Data Studio).
`DATABASE_NAME`	`Pipeline_*****`	The name of the database. This value is typically provided in an email from `NoReply_PipelineCloud@everyaction.com` during the provisioning stage of your Pipeline Cloud instance.
`TENANT_ID`	`798d7834-694a-41b4-b6cb-e5448f079f6b`	The tenant ID for Azure authentication. This is usually a fixed value and you should use the sample value provided.
`APPLICATION_ID`		The application ID for Azure authentication. Obtain this from the secret link provided in the email. This value is unique to your EveryAction Pipeline Cloud instance.
`CLIENT_CERT_PATH`	`pipelinecloud_privatenopass.pem`	Path to the client certificate PEM file. Obtain your private key from the secret link provided by the EveryAction team. This value is unique to your EveryAction Azure MFA account.
`CLIENT_CERT_THUMBPRINT`		Thumbprint of the client certificate. This is provided in the secret link email and is used to verify the certificate. This value is unique to your EveryAction Azure MFA user account.

Important Notes:

Private Key File: Ensure that pipelinecloud_privatenopass.pem is stored securely in the root directory and not within the /pipelinecloud/ app directory.
Environment File Location: The .env file should be located in the same directory as .env.example and the PEM file.

Directory Structure

.
├── .env.example
├── pipelinecloud_privatenopass.pem
├── pyproject.toml
├── README.md
└── pipelinecloud
    ├── dbconnection_basic.py
    └── dbconnection_sqlalchemy.py

.env.example: Template for environment variables.
pipelinecloud_privatenopass.pem: Your private key file (store in the root directory).
pipelinecloud/: Contains the Python scripts for database connection.

Running the Application Locally

Clone the Repository:

git clone <repository-url>
cd <repository-directory>

Set Up Environment Variables:
- Copy .env.example to .env and fill in your credentials.
- Ensure pipelinecloud_privatenopass.pem is in the root directory.
Install Dependencies:
```
poetry install
```
Run the Scripts Using Poetry:
- Precheck: Verify that your environment variables and PEM file are set up correctly.
```
poetry run precheck
```
- Basic Database Connection: Run the basic connection script using pyodbc.
```
poetry run connect-basic
```
- SQLAlchemy Database Connection: Run the advanced connection script using SQLAlchemy.
```
poetry run connect-sqlalchemy
```
- Run All: Execute the precheck and both connection methods sequentially.
```
poetry run full-check
```

Run the Scripts Directly (Without Poetry):

Precheck:
```
python pipelinecloud/precheck.py
```

Basic Database Connection:

python pipelinecloud/dbconnection_basic.py

SQLAlchemy Database Connection:

python pipelinecloud/dbconnection_sqlalchemy.py

Note: When running scripts directly, ensure that your Python environment is set up correctly and all dependencies are installed. You may need to activate the virtual environment created by Poetry using poetry shell.

SQLAlchemy Implementation with Azure AD Authentication

This project uses SQLAlchemy in conjunction with pyodbc to connect to an Azure SQL Database using Azure Active Directory (AD) tokens. This approach leverages the strengths of both libraries to provide a secure and efficient way to interact with the database.

Why Use `pyodbc` with SQLAlchemy?

Authentication: pyodbc is used to handle the authentication process with Azure AD tokens. This is necessary because the ODBC driver for SQL Server supports Azure AD authentication, but requires specific handling of the access token.
Database Interactions: SQLAlchemy provides a high-level ORM (Object-Relational Mapping) interface, allowing you to interact with the database using Python objects and methods. This makes it easier to manage database operations and transactions.

How It Works

Token Preparation:
- The Azure AD access token is prepared using Python's struct module. This involves converting the token into a format that pyodbc can use to authenticate with the SQL Server.
Connection Setup:
- A connection string is constructed without the Authentication attribute. Instead, the access token is passed directly to pyodbc using the attrs_before parameter. This bypasses the need for the Authentication attribute in the connection string.
SQLAlchemy Engine Creation:
- The SQLAlchemy engine is created using the connection established by pyodbc. This engine is then used to perform database operations, such as executing SQL queries and managing transactions.
Reflection and Raw SQL:
- SQLAlchemy's reflection feature is used to dynamically load table definitions from the Azure SQL database. This allows you to interact with tables without manually defining models.
- Raw SQL queries can be executed using SQLAlchemy's text() construct, providing flexibility for complex queries.

Benefits of This Approach

Security: By using Azure AD tokens, this implementation ensures secure authentication without the need for storing passwords.
Flexibility: The combination of pyodbc and SQLAlchemy allows for flexible and powerful database interactions, leveraging the best features of both libraries.
Compatibility: This approach is compatible with Azure SQL Database and can be adapted for use with other Azure services that support AD authentication.

Example Code

Below are examples of how to query the database using different methods:

Reflecting a Table and Querying with ORM

from sqlalchemy.orm import sessionmaker

def query_table(engine, table_name):
    Session = sessionmaker(bind=engine)
    session = Session()
    table = reflect_table(engine, table_name)
    query = session.query(table).limit(10)
    results = query.all()
    for row in results:
        print(row)
    session.close()

Executing a Raw SQL Query

from sqlalchemy import text

def execute_raw_query(engine):
    Session = sessionmaker(bind=engine)
    session = Session()
    raw_query = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = 'BASE TABLE';"
    result = session.execute(text(raw_query))
    tables = result.fetchall()
    for table in tables:
        print(table[0])
    session.close()

This setup ensures that your application can securely connect to Azure SQL Database using modern authentication methods, while still benefiting from the powerful features of SQLAlchemy. The hybrid approach with reflection and raw SQL execution provides flexibility and ease of use, especially when dealing with a large number of tables or complex queries.

Running the Application in Docker

Docker is a platform that allows you to package applications and their dependencies into a container, ensuring consistency across different environments. This setup helps verify a successful connection without being affected by your local environment. You can install Docker from the official Docker website.

Build and Run the Docker Container:
- First, ensure Docker is installed and running on your system.
- Build and run the Docker container using Docker Compose:
```
docker-compose up --build
```
This command will build the Docker image and start the container, running the application with the default script.
Switch Scripts:
- By default, the application runs dbconnection_basic.py. To run dbconnection_sqlalchemy.py, modify the CMD in the Dockerfile:
```
# Change the CMD line to run the SQLAlchemy script
CMD ["poetry", "run", "connect-sqlalchemy"]
```
Naming the Docker Container:
- You can specify a name for your Docker container by adding the --name flag to the docker-compose command:
```
docker-compose up --build --name pipelinecloud-app
```

ODBC Driver 18 for SQL Server

Mac: Install via Homebrew:

brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
ACCEPT_EULA=Y brew install msodbcsql18

Windows: Download and install from the Microsoft ODBC Driver for SQL Server.
Docker: The ODBC driver is included in the Docker image setup.

Placeholder Function

Both scripts include a placeholder function for querying the database. Replace the example query with your specific SQL or SQLAlchemy query to fetch data and log it to the console.

FAQ and Common Errors

Environment Variable Not Set

Ensure all required environment variables are set in your .env file.

ModuleNotFoundError

Activate the Poetry environment using poetry shell and ensure dependencies are installed with poetry install.

ODBC Driver Not Found

Ensure ODBC Driver 18 is installed on your system. Refer to the installation instructions above.

Connection Errors

Check your network settings and ensure the database server is accessible from your environment.

This repository provides a skeleton for connecting to the Pipeline Cloud service from EveryAction using Python. It is designed to be simple to run locally with your credentials to verify a connection can be made.

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. For more details, see https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
pipelinecloud		pipelinecloud
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipeline Cloud Database Connection

Prerequisites

Why Use Poetry?

Basic Poetry Commands

Environment Variables

Important Notes:

Directory Structure

Running the Application Locally

SQLAlchemy Implementation with Azure AD Authentication

Why Use `pyodbc` with SQLAlchemy?

How It Works

Benefits of This Approach

Example Code

Reflecting a Table and Querying with ORM

Executing a Raw SQL Query

Running the Application in Docker

ODBC Driver 18 for SQL Server

Placeholder Function

FAQ and Common Errors

Environment Variable Not Set

ModuleNotFoundError

ODBC Driver Not Found

Connection Errors

License

About

Releases

Packages

Languages

NO-Product/everyaction-pipeline-cloud-python

Folders and files

Latest commit

History

Repository files navigation

Pipeline Cloud Database Connection

Prerequisites

Why Use Poetry?

Basic Poetry Commands

Environment Variables

Important Notes:

Directory Structure

Running the Application Locally

SQLAlchemy Implementation with Azure AD Authentication

Why Use pyodbc with SQLAlchemy?

How It Works

Benefits of This Approach

Example Code

Reflecting a Table and Querying with ORM

Executing a Raw SQL Query

Running the Application in Docker

ODBC Driver 18 for SQL Server

Placeholder Function

FAQ and Common Errors

Environment Variable Not Set

ModuleNotFoundError

ODBC Driver Not Found

Connection Errors

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Why Use `pyodbc` with SQLAlchemy?

Packages