-
Notifications
You must be signed in to change notification settings - Fork 1.6k
tutorial to trigger dataflow jobs using cloud scheduler #1396
tutorial to trigger dataflow jobs using cloud scheduler #1396
Conversation
The CircleCI tests are failing because the document doesn't use the frontmatter that it needs. To see what you need to include at the top, see a published document, like this one: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions for adding detail. In step-by-step tutorials like this, it's good to be as explicit as possible and list out even steps that might be "obvious".
...s/schedule-dataflow-jobs-with-cloud-scheduler/schedule-dataflow-jobs-with-cloud-scheduler.md
Outdated
Show resolved
Hide resolved
...s/schedule-dataflow-jobs-with-cloud-scheduler/schedule-dataflow-jobs-with-cloud-scheduler.md
Outdated
Show resolved
Hide resolved
...s/schedule-dataflow-jobs-with-cloud-scheduler/schedule-dataflow-jobs-with-cloud-scheduler.md
Outdated
Show resolved
Hide resolved
...s/schedule-dataflow-jobs-with-cloud-scheduler/schedule-dataflow-jobs-with-cloud-scheduler.md
Outdated
Show resolved
Hide resolved
...s/schedule-dataflow-jobs-with-cloud-scheduler/schedule-dataflow-jobs-with-cloud-scheduler.md
Outdated
Show resolved
Hide resolved
![Set up your cloud scheduler](set_up_the_cloud_scheduler.png) | ||
|
||
|
||
If you use Terraform, here is one example to define a scheduler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tutorials usually avoid giving multiple paths. Should the user use the Console (above), or should they use Terraform? If TF, please give the command they would use to deploy this into their project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. I will stick with the TF solution here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with the Terraform approach, but there's a lot of setup involved which I think we need to cover.
I've forked your branch, templatized the TF a bit more and added the basic steps here:
https://github.com/jpatokal/community/tree/zhong-cloud-scheduler-dataflow-tutorial
Specific commits:
jpatokal@488f91e
5136fd2 d268bba
I don't have the right to commit directly into this PR, but maybe you can pull the commit above from my repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I have merged your changes into your branch.
...s/schedule-dataflow-jobs-with-cloud-scheduler/schedule-dataflow-jobs-with-cloud-scheduler.md
Outdated
Show resolved
Hide resolved
Thanks for the review @jpatokal . @zhongchen , I'll wait until you have resolved these review comments before I begin the editorial and production review. |
@jpatokal , let me know when the review comments are resolved to your satisfaction. |
@ToddKopriva Can you start providing some feedback as well? |
@zhongchen Apologies for the delay, the notifications for this went into my personal acct (oops). See comments above. |
tutorials/schedule-dataflow-jobs-with-cloud-scheduler/scheduler-dataflow-demo/cloudbuild.yaml
Show resolved
Hide resolved
tutorials/schedule-dataflow-jobs-with-cloud-scheduler/scheduler-dataflow-demo/cloudbuild.yaml
Outdated
Show resolved
Hide resolved
|
||
http_target { | ||
http_method = "POST" | ||
uri = "https://dataflow.googleapis.com/v1b3/projects/${var.project_id}/locations/${var.region}/templates:launch?gcsPath=gs://zhong-gcp/templates/dataflow-demo-template" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting 403 PERMISSION_DENIED from Scheduler when invoking this? (URL updated to my own bucket, of course.) There's a Dataflow Step INFO log for "dataflow.jobs.create" with authorizationInfo: granted: true, which would imply that DF is OK, but no further logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the doc to use Terraform to create the SA and set the permission. I tried to run the step in a cloud console and it worked for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested, and the Cloud Build SA cannot create/modify/act as other SAs or schedule jobs by default. Please direct the user to https://cloud.google.com/cloud-build/docs/securing-builds/configure-access-for-cloud-build-service-account#granting_a_role_using_the_iam_page and ask them to add these three roles:
- Service Accounts > Service Account Admin
- Service Accounts > Service Account User
- Cloud Scheduler > Cloud Scheduler Admin
But even after granting all these, assigning the extra roles to the successfully created new SA fails:
Step #0 - "Terraform init": Error: Batch "iam-project-hybrid-prom modifyIamPolicy" for request "Create IAM Members roles/dataflow.admin serviceAccount:scheduler-dataflow-demo@hybrid-prom.iam.gserviceaccount.com for "project \"hybrid-prom\""" returned error: Error retrieving IAM policy for project "hybrid-prom": googleapi: Error 403: The caller does not have permission, forbidden
...which makes no sense to me since roles/iam.serviceAccountAdmin includes iam.serviceAccounts.getIamPolicy. terraform graph also shows that the dependency is correctly recognized and the mods only made after the SA exists:
"[root] google_project_iam_member.cloud-scheduler-dataflow" -> "[root] google_service_account.cloud-scheduler-demo"
"[root] google_project_iam_member.cloud-scheduler-gcs" -> "[root] google_service_account.cloud-scheduler-demo"
Step #0 - "Terraform init": google_service_account.cloud-scheduler-demo: Creating...
Step #0 - "Terraform init": google_service_account.cloud-scheduler-demo: Creation complete after 2s [id=projects/hybrid-prom/serviceAccounts/scheduler-dataflow-demo@hybrid-prom.iam.gserviceaccount.com]
Step #0 - "Terraform init": google_project_iam_member.cloud-scheduler-dataflow: Creating...
Step #0 - "Terraform init": google_project_iam_member.cloud-scheduler-gcs: Creating...
[errors]
What am I missing? Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloud Scheduler Admin
Dataflow Admin
Service Account User
Project IAM Admin
Maybe you are missing Project IAM Admin
Role to manage IAM binding? I reduced the roles to the above four and it worked for me.
Can you try it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A-ha! Project IAM Admin was the missing role, it works now!
Two more tweaks:
- You need Service Account Admin to create the new SA
- You don't need Dataflow Admin (because it's the new SA that kicks off the job)
Also, in the gcloud builds command you need to use $BUCKET_NAME (not $BUCKET), otherwise it tries to access gs://gs://bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually removed the bucket_name
completely. Currently bucket
represents the bucket name. In the scripts, I have added back the gs://
prefix for all the bucket paths.
…zhongchen/community into zhong-cloud-scheduler-dataflow-tutorial
Working now! Zhong, please add in the two last tweaks, and @ToddKopriva , you'll be good to go. |
Thanks, @jpatokal . @zhongchen , I'll wait until you have made the suggested change, and then I'll begin the editorial and production process. |
…zhongchen/community into zhong-cloud-scheduler-dataflow-tutorial
@ToddKopriva I have addressed all the comments. |
…latform#1396) * tutorial to trigger dataflow jobs using cloud scheduler * format the tutorial to fix circle ci checks * change the title format * address comments * Templating and step-by-step instructions * Enable APIs * Add template compilation * add the architecture diagram * rename the build script * address comments * minor fixes * address comments * add cloudbuild sa setup * add project iam admin role * add dummy logic for dataflow job * update sa setup * first edit pass during readthrough * second edit pass Co-authored-by: zhong <zhongchen@google.com> Co-authored-by: Todd Kopriva <43478937+ToddKopriva@users.noreply.github.com> Co-authored-by: Jani Patokallio <jani@google.com>
Port the Medium tutorial to the community.
https://jira.gcp.solutions/browse/PUB-2764