In this Workshop bundle, we have explored and demonstrated a number of integration patterns between Atlas and GCP data products. Specifically Atlas integration with BigQuery, VertexAI and Looker. This repository contains the code samples and steps required for a customer to enable themselves and try out the solutions by themselves.
Patterns | Description |
---|---|
MongoDB Collection level change stream to Pub Sub | This pattern demonstrates how the MongoDB collection level changes have been monitored and published to GCP PUB/SUB topic using python code. |
MongoDB Database level change stream to Pub Sub | This pattern demonstrates how the MongoDB database level changes have been monitored and published to GCP PUB/SUB topic using python code. |
Configuring BigQuery PubSub native connector | This pattern demonstrates consuming messages with defined schema from GCP PUB/SUB topic to GCP Bigquery table using native connector. |
Configuring Dataflow template for PubSub to BQ | This pattern demonstrates the data streams from GCP Pub/Sub to the GCP bigquery table by Dataflow Job using template "PubSub Subscription to BigQuery". |
Configuring BigQuery Kafka connector for MongoDB (either Confluent or Apache Kafka) | This pattern demonstrates the dataflow from MongoDB to the GCP bigquery table using the ConfluentKafka connector. |
MongoDB to BigQuery sync using the Spark connector | This pattern demonstrates the dataflow from MongoDB to the GCP bigquery table using Spark connector. |
Real time integration of Vertex AI endpoint with change stream | This pattern demonstrates creation and deployment of Model on vertexAI and its real-time integration with MongoDB changestream. |
Reverse batch sync from BigQuery to MongoDB | This pattern demonstrates the dataflow from GCP bigquery table to MongoDB collection by dataflow job using "BigQuery to MongoDB" template. |
-
Create an Account:
- If you don't already have one, create an account with your MongoDB cloud service provider. For this example, we'll use MongoDB Atlas.
-
Log in:
- Log in to the MongoDB Atlas with existing account credentials.
-
Create Cluster:
-
Create Database and Collection:
- Create the Database and the Collection in the Cluster.
Note: For the scope of this workshop, create the collection Customer "sample_analytics.Customer" with the below document structure
{ "_id": "1", "username": "CSCollectionleveltoGCPpubsub", "name": "Derek Curtis", "address": "565 Hodge Motorway Suite 101 Wendyberg, FL 57099", "email": "qgibson@hotmail.com" }
- Create an Account:
- If you don't already have one, create an account with your GCP provider.
- Log in:
- Log in to the account with the credentials.
- Enable APIs:
- Click on the APIs and Service in the left panel, then enable the API's services.
- Detailed steps for creating GCP Account can be found here
Use gcloud utility tool to authenticate to Google Cloud project. Before starting with the workshop, please make sure that the gcloud CLI is installed and up-to-date, and that the client is properly authenticated using:
- Login to gcloud and select the project using gcloud init command:
gcloud init
- authenticate to application default login:
gcloud auth application-default login
Before you start using this application, ensure that you have the necessary software and libraries installed. Follow these steps to set up your development environment:
Make sure you have Python installed on your system. Follow steps described here pip installation steps is described here
Poetry Setup:
pip3 install poetry
poetry init
brew install pyenv
poetry env use <replace with your python version>
poetry shell
poetry install
To work with MongoDB databases, you'll need to install the PyMongo library. Use the following command to install it:
pip3 install pymongo
For managing environment variables and configuration settings in your Python application, you'll need the python-dotenv library. Install it with the following command:
pip3 install python-dotenv
If your application interacts with Google Cloud Pub/Sub for messaging, install the google-cloud-pubsub library with the following command:
pip3 install google-cloud-pubsub
To provide a certificate bundle for secure communication over HTTPS and SSL/TLS-encrypted protocols, install the certifi library:
pip3 install certifi
Ensure that you have Python and pip installed on your system before running these commands. These libraries and tools are essential for the proper functioning of the application.
Now, you're all set to use the application with the required dependencies in place.
For Each Project run the main.py in its respective folder.