Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Slack Terraform CI automation timeouts #1238

Open
2 tasks done
smoya opened this issue Jun 5, 2024 · 12 comments
Open
2 tasks done

[BUG] Slack Terraform CI automation timeouts #1238

smoya opened this issue Jun 5, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@smoya
Copy link
Member

smoya commented Jun 5, 2024

Describe the bug.

@Shurtu-gal did an excellent job with automating the creation and maintainability of AsyncAPI Slack channels and user groups. See #1072

However, we faced a blocker issue that makes the Terraform manifest to fail due to timeouts requesting Slack API.

The TF provider is not optimized at all. I have the feeling this code is being executed per each managed Usergroup whenever TF wants to refresh its state: https://github.com/pablovarela/terraform-provider-slack/blob/master/slack/resource_usergroup.go#L108-L128, so potentially we are calling the usergroups.list API method on each usergroup we have.

Expected behavior

I believe we could do some work on the provider repo (it's written in go, seems easy to read and understand at a glance) so we can implement some caching or whatever mechanism we decide. But in short term, I can't see how to fix it.

Screenshots

iTerm2_f9He0j7z

How to Reproduce

terraform apply with the proper Slack Token configured (ask @derberg or me)

🥦 Browser

None

👀 Have you checked for similar open issues?

  • I checked and didn't find similar issue

🏢 Have you read the Contributing Guidelines?

Are you willing to work on this issue ?

None

@derberg
Copy link
Member

derberg commented Jun 10, 2024

probably more issues are there - we just merged PR with new WG

https://github.com/asyncapi/community/actions/runs/9446779045/job/26017154335

@smoya
Copy link
Member Author

smoya commented Jun 10, 2024

https://github.com/asyncapi/community/actions/runs/9446779045/job/26017154335

I don't see if this is the issue but what I see is that both the channel and the group have the same handle wg_marketing and that's completely incompatible as mentioned in the header comment of the file:

The handle should be unique and not in use by a member, channel, or another group.

@Shurtu-gal
Copy link
Contributor

Shurtu-gal commented Jun 10, 2024

The issue is with invalid yaml. If a string has colon it needs to be double quoted.
Can be seen here :

description: The group is dedicated to leveraging marketing strategies to achieve two key objectives: promoting AsyncAPI adoption and highlighting community achievements. By strategically showcasing AsyncAPI capabilities and celebrating community successes, the group drives both user growth and community engagement. It shares a vision of close collaboration between AsyncAPI community and sponsors.

cc: @smoya @derberg

@smoya
Copy link
Member Author

smoya commented Jun 10, 2024

The issue is with invalid yaml. If a string has colon it needs to be double quoted. Can be seen here :

description: The group is dedicated to leveraging marketing strategies to achieve two key objectives: promoting AsyncAPI adoption and highlighting community achievements. By strategically showcasing AsyncAPI capabilities and celebrating community successes, the group drives both user growth and community engagement. It shares a vision of close collaboration between AsyncAPI community and sponsors.

cc: @smoya @derberg

Yup, fix is here #1251

@derberg
Copy link
Member

derberg commented Jun 11, 2024

oh thanks, I suggest we need a workflow like https://github.com/asyncapi/community/blob/master/.github/workflows/validate-maintainers.yml#L12-L43 with json schema that we validate against - as these issues will pop up regularly, and with JSON schema you can do lots of validation cases, even pattern validation.

@derberg
Copy link
Member

derberg commented Jun 11, 2024

regarding timeouts

workflows have option to react of failure? are we able to parse error in such step, figure it is timeout and retry?

other than that, minimum we can do is drop error in slack, that someone needs to rerun the job, we support such things already - we can have custom message that tags certain people for example

@smoya
Copy link
Member Author

smoya commented Jun 11, 2024

workflows have option to react of failure? are we able to parse error in such step, figure it is timeout and retry?

You can retry as many times you want that it will keep failing. As stated in the description of the issue:

so potentially we are calling the usergroups.list API method on each usergroup we have.

We have more user groups than the API rate limit allows per minute (20 calls). See
Google Chrome_OioCRhAF

The issue is that the TF provider seems to be doing one call to such API per group instead of just one for getting all of them (pending to be confirmed but 95% convinced)

@derberg
Copy link
Member

derberg commented Jun 12, 2024

You can retry as many times you want that it will keep failing. As stated in the description of the issue:

sorry, that wasn't clear for me. So basically it means automation will always fail atm?

btw - it fails for different reason here https://github.com/asyncapi/community/actions/runs/9464298763/job/26071337562

and what about GitHub teams automation?

@smoya
Copy link
Member Author

smoya commented Jun 13, 2024

btw - it fails for different reason here https://github.com/asyncapi/community/actions/runs/9464298763/job/26071337562

I don't understand such an error. In fact I can't reproduce the same state as in our CI even though the tfstate file is the same. That's weird... @Shurtu-gal any idea? I expect terraform plan in master branch to have the same plan as in the link @derberg shared, but it's not the case in my local env

Examples of things my terraform plan says:

Terraform planned the following actions, but then encountered a problem:

  # module.channels.slack_conversation.channels["01_introductions"] will be updated in-place
  ~ resource "slack_conversation" "channels" {
      + action_on_destroy                  = "archive"
      + action_on_update_permanent_members = "none"
      + adopt_existing_channel             = true
        id                                 = "C023GJWH33K"
        name                               = "01_introductions"
        # (10 unchanged attributes hidden)
    }

  # module.channels.slack_conversation.channels["02_general"] will be updated in-place
  ~ resource "slack_conversation" "channels" {
      + action_on_destroy                  = "archive"
      + action_on_update_permanent_members = "none"
      + adopt_existing_channel             = true
        id                                 = "C34F2JV0U"
        name                               = "02_general"
        # (10 unchanged attributes hidden)
    }

@smoya
Copy link
Member Author

smoya commented Jun 13, 2024

bounty/candidate

@Shurtu-gal
Copy link
Contributor

@smoya checked for various stuff:

@derberg you would need to check both the bot-token in secret as well as the app maybe.

@smoya
Copy link
Member Author

smoya commented Sep 18, 2024

It seems there is a fork of the terraform provider that handles timeouts when creating groups. See pablovarela/terraform-provider-slack#223 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants