Skip to content

Latest commit

 

History

History
150 lines (92 loc) · 7.88 KB

File metadata and controls

150 lines (92 loc) · 7.88 KB

Deploying EDP Rest APIs Connector

This tutorial shows you how to configure the Cloud Build to deploy the API Education over the Data Platform Foundation.

The following diagram is a high-level reference of the resources created and managed here:

API Education Data Platform

After you provision the Data Platform Foundation you need to create a Cloud Storage Bucket in GCP in the drop-off project to store the Terraform state file and reserve the following values for the next steps:

  • Drop-off Project ID (PREFIX-drp).
  • Name of the Drop-off Bucket (PREFIX-drp-cs-0).
  • Composer Service Account Email (PREFIX-orc-cmp-0@PREFIX-orc.iam.gserviceaccount.com).
  • Region in which the EDP resources were created.
  • The endpoint name for the Cloud Function responsible for API ingestion (PREFIX).
  • Grant 'Storage Object Creator' role in the Orchestration Project (PREFIX-orc) for the Cloud Build Service Account that is going to be created in the Drop-off project.

You are going to use the above information to create a trigger in Drop-off project, as described in the following steps.

Trigger for Education API

  1. First you select the Dropoff Project (PREFIX-drp)

Project List

  1. Once in the project context, go to the Cloud Build Triggers

Cloud Build

  1. Select the option CREATE TRIGGER

Create Trigger

  1. Give a name to the trigger and select the option for trigger invokation

Create Trigger

  1. On 'Source' click on the Repository Box and select the option CONNECT NEW REPOSITORY

  1. Select the Service Source and click on Continue

Create Trigger

  1. Authenticate on the Service Source, select the Repository, check the box accepting the conditions and click in CONNECT

Create Trigger

  1. Add the filter to trigger the build modifications only in the folders for API Pipeline Bucket

Create Trigger

  1. On Configuration select the option Cloud Build configguration file (yaml or json) and inform the relational path to the build file

Create Trigger

  1. On Advanced select the option ADD VARIABLE and create the following variable with the name of Load Cloud Storage
Variables Example value Description
_BUCKET_CONFIG tfstate-bucket Bucket Name for Terraform tfstate
_BUCKET_ID PREFIX-drp-cs-0 Bucket Name of Drop-off environment
_COMPOSER_IAM_EMAIL PREFIX-orc-cmp-0@PREFIX-orc.iam.gserviceaccount.com Composer Service Account email
_DATAFLOW_COMPOSER_BUCKET REGION-PREFIX-orc-cmp-0-RANDOM_ID-bucket Name of Composer Bucket
_PREFIX api Name of Cloud Function prefix
_PROJECT_ID PREFIX-drp ID of the Drop-off project
_REGION us-west1 Region of the deploy
_INSTANCES 5 Cloud Function maximum instances

Create Trigger

  1. Click on CREATE

  2. Go to Cloud Build Settings and get the Cloud Build Service Account email

Create Trigger

  1. Go to Orchestration Project (PREFIX-orc)

Create Trigger

  1. Select IAM & Admin

Create Trigger

  1. Click in GRANT ACCESS inform the Service Account email in 'New principals' and select the Role 'Storage Admin(roles/storage.admin)'

  1. Next you select the Dropoff Project (PREFIX-drp)

Project List

  1. Click in GRANT ACCESS inform the Service Account email in 'New principals' and select the Roles:
  • 'Storage Admin(roles/storage.Admin)'

  • 'Service Account User(roles/iamserviceAccountUser)'

  • 'Create Service Account(roles/iam.serviceAccounts.create)'

  • 'Cloud Functions Admin(roles/cloudfunctions.admin)'

  • 'Security Admin(roles/iam.securityAdmin)'

    Create Trigger

How to run

After creating the Cloud Build trigger on your drop-off project, you need to configure the gcp_api/config.json file according to your parameters.

You need to setup all the variables according to your project information before running the pipelines (instructions in the following session).

As soon as you commit the updated config.json file in your GitHub repository, Cloud Build is going to be triggered and all the files needed for this deploy is going to be loaded automatically in your Education Data Platform in GCP.

Cloud Function (API)

The API connector in EDP is meant to consume other APIs and generate files (JSON or CSV) inside a bucket, in the GCP environment. The API was built using a cloud function and its use is very flexible, allowing the consumption of several endpoints in the same processing.

To use the function, update the configuration file (config.json) in the Dropoff Bucket (PREFIX-drp-cs-0), folder config - in the GCP environment with the following parameters:

  • "project_id": "Project ID of the Dropoff Project PREFIX-drp";
  • "bucket": "Name of the Dropoff Bucket ID PREFIX-drp-cs-0 that contains the configuration file with the name config.json";
  • "api_path": "Folder name where the function will save the returned APIs consumed in JSON or CSV format;
  • "api_uri": "http or https address of the function (trigger)";
  • "endpoints": "List of objects containing the necessary information for the correct functioning of the function";
  • "location": GCP location for multi regional resources on BigQuery (if applicable to the region);
  • "region": GCP region.

Within the list of endpoints, it is necessary to assemble objects with the following structure:

  • "url": "API / Endpoint address to be consumed";
  • "type": "Output file format (JSON or CSV)";
  • "blob": "Path inside the bucket to generate the files (by default use a name of your choice plus '/load' for correct integration with the bigquery load routines"; The name of the folder where the file is stored will be the name of the BigQuery table
  • "project_id_bq": "Id of project that contains the BigQuery tables - Landing Raw Data (PREFIX-dwh-lnd)"
  • "dataset_name": "Name of the dataset where the moodle tables will be stored in the Landing Raw Data (PREFIX-dwh-lnd)"
  • "retention_data": "Time in seconds for the table partition data retention"

Cloud Composer

The Cloud Composer is going to be used as a orchestration solution of ingestion pipelines of data from API execution pipeline, besides other process auxiliary DAG's.

After deploy the API connector artifacts, you are going to find the following DAGs deployed on Cloud Composer in the orchestration project (PREFIX-orc):

  • API_pipeline: call the Rest APIs parameterized in the config files and load the json or csv files in the drop off bucket;
  • bq_load: load the json or csv files ingested by the API_pipeline DAG in the BigQuery landing zone.