Skip to content

Contains Workshops for MongoDB and BigQuery integrations

Notifications You must be signed in to change notification settings

mongodb-partners/MongoDb-BigQuery-Workshops

Repository files navigation

Overview

In this Workshop bundle, we have explored and demonstrated a number of integration patterns between Atlas and GCP data products. Specifically Atlas integration with BigQuery, VertexAI and Looker. This repository contains the code samples and steps required for a customer to enable themselves and try out the solutions by themselves.

Patterns Description
MongoDB Collection level change stream to Pub Sub This pattern demonstrates how the MongoDB collection level changes have been monitored and published to GCP PUB/SUB topic using python code.
MongoDB Database level change stream to Pub Sub This pattern demonstrates how the MongoDB database level changes have been monitored and published to GCP PUB/SUB topic using python code.
Configuring BigQuery PubSub native connector This pattern demonstrates consuming messages with defined schema from GCP PUB/SUB topic to GCP Bigquery table using native connector.
Configuring Dataflow template for PubSub to BQ This pattern demonstrates the data streams from GCP Pub/Sub to the GCP bigquery table by Dataflow Job using template "PubSub Subscription to BigQuery".
Configuring BigQuery Kafka connector for MongoDB (either Confluent or Apache Kafka) This pattern demonstrates the dataflow from MongoDB to the GCP bigquery table using the ConfluentKafka connector.
MongoDB to BigQuery sync using the Spark connector This pattern demonstrates the dataflow from MongoDB to the GCP bigquery table using Spark connector.
Real time integration of Vertex AI endpoint with change stream This pattern demonstrates creation and deployment of Model on vertexAI and its real-time integration with MongoDB changestream.
Reverse batch sync from BigQuery to MongoDB This pattern demonstrates the dataflow from GCP bigquery table to MongoDB collection by dataflow job using "BigQuery to MongoDB" template.

Pre-requisite

Create MongoDB Cluster with Database and Collections

  • Create an Account:

    • If you don't already have one, create an account with your MongoDB cloud service provider. For this example, we'll use MongoDB Atlas.
  • Log in:

    • Log in to the MongoDB Atlas with existing account credentials.
  • Create Cluster:

    • Create a MongoDB Atlas cluster
    • whitelist the required IPs to be accessed as described.
  • Create Database and Collection:

    • Create the Database and the Collection in the Cluster.

    Note: For the scope of this workshop, create the collection Customer "sample_analytics.Customer" with the below document structure

     {
       "_id": "1",
       "username": "CSCollectionleveltoGCPpubsub",
       "name": "Derek Curtis",
       "address": "565 Hodge Motorway Suite 101 Wendyberg, FL 57099",
       "email": "qgibson@hotmail.com"
     }
    

Create GCP Account

  • Create an Account:
    • If you don't already have one, create an account with your GCP provider.
  • Log in:
    • Log in to the account with the credentials.
  • Enable APIs:
    • Click on the APIs and Service in the left panel, then enable the API's services.
  • Detailed steps for creating GCP Account can be found here

Authenticate to Google Cloud project

Use gcloud utility tool to authenticate to Google Cloud project. Before starting with the workshop, please make sure that the gcloud CLI is installed and up-to-date, and that the client is properly authenticated using:

  • Login to gcloud and select the project using gcloud init command: gcloud init
  • authenticate to application default login: gcloud auth application-default login

Software to be Installed to Run the Application

Before you start using this application, ensure that you have the necessary software and libraries installed. Follow these steps to set up your development environment:

1. Python

Make sure you have Python installed on your system. Follow steps described here pip installation steps is described here

Poetry Setup:

pip3 install poetry
poetry init
brew install pyenv
poetry env use <replace with your python version>
poetry shell
poetry install

2. PyMongo

To work with MongoDB databases, you'll need to install the PyMongo library. Use the following command to install it:

pip3 install pymongo

3. python-dotenv

For managing environment variables and configuration settings in your Python application, you'll need the python-dotenv library. Install it with the following command:

pip3 install python-dotenv

4. Google Cloud Pub/Sub

If your application interacts with Google Cloud Pub/Sub for messaging, install the google-cloud-pubsub library with the following command:

pip3 install google-cloud-pubsub

5. Certifi

To provide a certificate bundle for secure communication over HTTPS and SSL/TLS-encrypted protocols, install the certifi library:

pip3 install certifi

Ensure that you have Python and pip installed on your system before running these commands. These libraries and tools are essential for the proper functioning of the application.

Now, you're all set to use the application with the required dependencies in place.

Steps to run each Projects

For Each Project run the main.py in its respective folder.

About

Contains Workshops for MongoDB and BigQuery integrations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages