Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Dataflow Flex job "update-by-replacement" #8408

Closed
scotthew1 opened this issue Feb 8, 2021 · 13 comments
Closed

Support for Dataflow Flex job "update-by-replacement" #8408

scotthew1 opened this issue Feb 8, 2021 · 13 comments

Comments

@scotthew1
Copy link

scotthew1 commented Feb 8, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment. If the issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If the issue is assigned to a user, that user is claiming responsibility for the issue. If the issue is assigned to "hashibot", a community member has claimed the issue already.

Description

In #5991, support was added to google_dataflow_job to "update-by-replacement" when certain criteria are met. could the same be done for the newer google_dataflow_flex_template_job?

New or Affected Resource(s)

  • google_dataflow_flex_template_job

References

b/359623870

@n-oden
Copy link

n-oden commented May 19, 2021

Some testing with hashicorp/terraform-provider-google-beta#3069 suggests to me that this is not fully baked yet:

  • if a flex template job is cancelled/drained from the console, the provider still sees the job as available for update rather than needing to be created from scratch
  • if the new job created by an update action fails, the provider does not recognize this, and the resource stays in "Still modifying..." indefinitely

cc: @andremarianiello

@andremarianiello
Copy link

@n-oden

if a flex template job is cancelled/drained from the console, the provider still sees the job as available for update rather than needing to be created from scratch

This issue seems to occur when the old job is in the "cancelling" state. The provider does not consider this to be a terminal state, so the job is seen as an existing resource while this is the case. Once the job finished cancelling the provider recognizes it as a missing resource, and applying at this point will create the job again. This is the behavior I saw with both v3.67.0 and v3.68.0, so I don't see how this problem would be related to hashicorp/terraform-provider-google-beta#3069. I think this issue deserves its own ticket.

if the new job created by an update action fails, the provider does not recognize this, and the resource stays in "Still modifying..." indefinitely

This I was able to reproduce, and luckily isn't hard to fix. I will make a PR with the changes.

@andremarianiello
Copy link

@n-oden I have made hashicorp/terraform-provider-google-beta#3279 to address your second bullet point.

@n-oden
Copy link

n-oden commented May 24, 2021

Thank you! W/R/T the first issue I noted, I think there's a state-handling issue at this level as well: even if you wait for the cancelled/drained job to fully terminate, the provider does not see that the job needs to be re-created.

@andremarianiello
Copy link

Can you provide the steps you performed to trigger that issue? In my testing I did the following:

  1. terraform apply to create brand-new job. Wait for job to be running.
  2. Cancel running job in console
  3. While job is cancelling, running terraform apply says that there are no changes, so it does nothing.
  4. After job is fully terminated, terraform apply refreshes state, sees that the job is in a terminal state and sets the resource ID to "", thus deleting it from the state, so it makes a new job from scratch.

Are you doing something different?

@n-oden
Copy link

n-oden commented May 25, 2021

@andremarianiello to my aggravation this doesn't seem to be happening consistently, so right now I can just promise that the next time I run into it I'll post an update here.

@n-oden
Copy link

n-oden commented May 25, 2021

Opened #9227 to track the update-while-terminating issue.

@n-oden
Copy link

n-oden commented Jun 14, 2021

@andremarianiello I think I caught it in the act at last. The specific scenario seems to be:

  • the original version of the job should be running
  • update the container_spec_gcs_path value to a GCS path that does not yet exist (for example: the spec file is built by a CI/CD process and the job hasn't finished yet)
  • apply terraform: it will attempt to update the job but will fail because the GCS path for the spec file is invalid and the flex template launch API will return a 404 error
  • put a valid spec file into the location specified by container_spec_gcs_path (ie: the CI/CD process finally caught up and put the spec file there)
  • run terraform plan -- it will see no changes to be made

@andremarianiello
Copy link

@n-oden Is this a different issue? It doesn't sound like either of the issues you described initially. Regardless, it sounds like failed updates are updating the terraform state, which is an issue I ran into and fixed while working on the changes in GoogleCloudPlatform/magic-modules#4847, which is merged and released in v3.71.0. What version of the provider are you using?

@n-oden
Copy link

n-oden commented Jun 14, 2021

Interesting. I saw the behavior w/ Terraform Cloud and we don't have the provider version pinned in versions.tf, so it should have been the latest one? Annoyingly, tf cloud does not seem to put that information anywhere in their log outputs.

@github-actions github-actions bot added forward/review In review; remove label to forward service/dataflow labels Oct 5, 2023
@ggtisc ggtisc self-assigned this Aug 13, 2024
@ggtisc
Copy link
Collaborator

ggtisc commented Aug 13, 2024

The other issues were closed or removed... It needs to be reviewed if it is still within the plans to make this enhancement

@ggtisc ggtisc removed their assignment Aug 13, 2024
@ggtisc ggtisc removed the forward/review In review; remove label to forward label Aug 13, 2024
@roaks3
Copy link
Collaborator

roaks3 commented Sep 26, 2024

It looks like this was mostly resolved by hashicorp/terraform-provider-google-beta#3069 (and upstreamed by GoogleCloudPlatform/magic-modules#4677), then patched with GoogleCloudPlatform/magic-modules#4847 and GoogleCloudPlatform/magic-modules#4845.

Closing this as fixed.

@roaks3 roaks3 closed this as completed Sep 26, 2024
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants