Skip to content

Google-provided Cloud Dataflow template pipelines for solving simple in-Cloud data tasks

License

Notifications You must be signed in to change notification settings

dunzoit/DataflowTemplates

 
 

Repository files navigation

Google Cloud Dataflow Template Pipelines

These Dataflow templates are an effort to solve simple, but large, in-Cloud data tasks, including data import/export/backup/restore and bulk API operations, without a development environment. The technology under the hood which makes these operations possible is the Google Cloud Dataflow service combined with a set of Apache Beam SDK templated pipelines.

Google is providing this collection of pre-implemented Dataflow templates as a reference and to provide easy customization for developers wanting to extend their functionality.

Open in Cloud Shell

Running the PubSubDLQToBigquery

Steps

  • Create separate subscriptions for topics you want to ingest into BQ
  • Create a table in Bigquery. Schema should be same as prod-data-platform:error_reporting.fact_pubsubdlq_errors. This job also ingests the pubsub attributes and actual bytes in addition to raw payload.
  • --inputSubscriptions expects a comma separated list of subscriptions
  • --jobName param can be different.
  • Run the job for 0.5-1 hour and then stop otherwise high costs in dataflow and BQ may be incurred.
  • Ideally backend events dlq job should be run separately due to higher volume there.
  • The command below includes every non backend events DLQ. Add more subscriptions if needed. mvn clean compile exec:java \ -Dmaven.clean.failOnError=false -Dexec.mainClass=com.google.cloud.teleport.templates.PubSubToBigQuery \ -Dexec.cleanupDaemonThreads=false \ -Dexec.args=" --region=asia-south1 --project=prod-data-platform --inputSubscriptions=projects/prod-data-platform/subscriptions/dead-letter-postgres-othersources-test-subscip,projects/prod-data-platform/subscriptions/dead-letter-postgres-load-othersources-test-subscr,projects/prod-data-platform/subscriptions/dead-letter-espresso-postgres-transform-test-subscip,projects/prod-data-platform/subscriptions/dead-letter-store-realtime-test-subscr,projects/prod-data-platform/subscriptions/dead-letter-user-frauddetection-realtime-sub,projects/prod-data-platform/subscriptions/dead-letter-userappui-sessiondata-test-subscr,projects/prod-data-platform/subscriptions/dead-letter-usertasks-runnertask-test-subscr,projects/prod-data-platform/subscriptions/dead-letter-usertasktask-realtime-test-subscip,projects/prod-data-platform/subscriptions/dead-letter-pubsub-transform-mongo-stream-other-test-sbscr,projects/prod-data-platform/subscriptions/dead-letter-pubsub-bq-load-mongo-stream-others-testsbscr,projects/prod-data-platform/subscriptions/dead-letter-bq-load-espresso-postgres-test-sbscr,projects/prod-data-platform/subscriptions/dead-letter-freshchat-user-csat-test-sub,projects/prod-data-platform/subscriptions/dead-letter-dbz-extract-mongo-stream-others-test-sbscr --jobName=pubsub-dlq-to-bq-errors --outputTableSpec=prod-data-platform:error_reporting.fact_pubsubdlq_errors --stagingLocation=gs://dunzo-de-dataflow-temp-jobs/staging --runner=DataflowRunner"

Run this command

About

Google-provided Cloud Dataflow template pipelines for solving simple in-Cloud data tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 97.8%
  • JavaScript 1.3%
  • Other 0.9%