Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing root_parent_id results in Management Groups not being deployed #190

Closed
lpmulligan opened this issue Oct 14, 2021 · 10 comments · Fixed by #221
Closed

Changing root_parent_id results in Management Groups not being deployed #190

lpmulligan opened this issue Oct 14, 2021 · 10 comments · Fixed by #221
Assignees
Labels
bug Something isn't working

Comments

@lpmulligan
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Versions

terraform: 1.0.8

azure provider: 2.80.0

module: 1.0.0

Description

I'm trying to deploy an Enterprise Scale LZ deployment within an existing EA structure. I've been assigned owner to a Management Group 3 levels down from the Tenant Root Group. I would like to deploy a stock ES LZ within that management group. Changing root_parent_id from:

root_parent_id = data.azurerm_client_config.core.tenant_id
root_id        = "myorg"
root_name      = "My Organization"

deploy_demo_landing_zones = true

to

root_parent_id = "ELZDemo"
root_id        = "myorg"
root_name      = "My Organization"

deploy_demo_landing_zones = true

Results in all the
resource "azurerm_management_group" "level_1", resource "azurerm_management_group" "level_2", etc resources being skipped.

Manually creating the Management Groups under "ELZDemo" by hand allows the core management terraform apply to run successfully.

Steps to Reproduce

  1. Set root_parent_id = data.azurerm_client_config.core.tenant_id in main.tf
  2. Run terraform plan
  3. You get MG groups created with 191 resources
  4. Set root_parent_id = "ELZDemo"
  5. You get no MG groups with 180 resources.

Screenshots

N/A

Additional context

N/A

@jtracey93
Copy link
Collaborator

Hey @lpmulligan,

Thanks for raising an issue 👍

I have just tested this myself and it worked okay for me:
image

As you can see highlighted in orange.

My TF file looked like this:

# We strongly recommend using the required_providers block to set the
# Azure Provider source and version being used.

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 2.77.0"
    }
  }
}

provider "azurerm" {
  features {}
}

# You can use the azurerm_client_config data resource to dynamically
# extract connection settings from the provider configuration.

data "azurerm_client_config" "core" {}

# Call the caf-enterprise-scale module directly from the Terraform Registry
# pinning to the latest version

module "enterprise_scale" {
  source  = "Azure/caf-enterprise-scale/azurerm"
  version = "1.0.0"

  providers = {
    azurerm              = azurerm
    azurerm.connectivity = azurerm
    azurerm.management   = azurerm
  }

  root_parent_id = "jt-102021-landingzones"
  root_id        = "190"
  root_name      = "GH Issue 190 Test"

  deploy_demo_landing_zones = true
}

Are you able to share your TF file/s and a simple diagram of your Management Group hierarchy? (please redact any sensitive information)

Look forward to hearing from you

Thanks

Jack

@jtracey93 jtracey93 added bug Something isn't working waiting for response labels Oct 14, 2021
@jtracey93 jtracey93 self-assigned this Oct 14, 2021
@lpmulligan
Copy link
Author

@jtracey93. Thanks for double-checking this for me.

I did try this deployment in two separate Azure tenants to make sure I didn't miss something. The second tenant being my MSDN one. I got the same results; hence, I opened up this issue.

A few things:

  1. I reran things again in my MSDN subscription and things did work as expected. I think I may have forgotten
    to add deploy_demo_landing_zones = true during my testing. Sorry about. Adding deploy_demo_landing_zones = true may not be the reason it worked. See Fix to allow null es_root_parent_id input #4 below.

  2. In my EA subscription, things continue to not work.

  • Here's a screenshot of my MG layout.
    MG Groups

You'll notice that I don't have access to all the MGs in the tree like you do...nor like I do in my MSDN sub!

  • My code is similar to yours. Here's the module block
module "enterprise_scale" {
  source  = "Azure/caf-enterprise-scale/azurerm"
  version = "1.0.0"

  providers = {
    azurerm              = azurerm
    azurerm.connectivity = azurerm
    azurerm.management   = azurerm
  }

  root_parent_id = "ELZDemo"
  root_id        = "eslz"
  root_name      = "Enterprise-Scale LZ"

  deploy_demo_landing_zones = true
}
  1. Some interesting developments
  • Setting my module to this...exactly like yours
module "enterprise_scale" {
  source  = "Azure/caf-enterprise-scale/azurerm"
  version = "1.0.0"

  providers = {
    azurerm              = azurerm
    azurerm.connectivity = azurerm
    azurerm.management   = azurerm
  }

  root_parent_id = "jt-102021-landingzones"
  root_id        = "190"
  root_name      = "GH Issue 190 Test"

  deploy_demo_landing_zones = true
}

Results in the TF plan saying it's going to create the MG groups as expected even though the parent MG "jt-102021-landingzones" doesn't exist.

However, if I change root_parent_id = "jt-102021-landingzones" to root_parent_id = "ELZDemo" the TF plan skips the creation of the MGs.

  1. Sorry for waiting until the end, but I think I've narrowed down the problem.
  • If I set root_parent_id = "ELZDemo", it doesn't work. However, if I set root_parent_id = "eslz-demo" it does. So I'm thinking there's something up with how the module identifies the root_parent_id MG and there's no hyphen in there.

Sincerely,

Larry

@jtracey93
Copy link
Collaborator

Hey @lpmulligan,

Thanks for the excellent write up, really useful. Let me review and investigate further and we'll come back to you 👍

Cheers

Jack

@jtracey93
Copy link
Collaborator

Quick update have tested the hyphen theory out further:
image

Gives me the 191 resources:
image

Code:

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 2.77.0"
    }
  }
}

provider "azurerm" {
  features {}
}

module "enterprise_scale" {
  source  = "Azure/caf-enterprise-scale/azurerm"
  version = "1.0.0"

  providers = {
    azurerm              = azurerm
    azurerm.connectivity = azurerm
    azurerm.management   = azurerm
  }

  root_parent_id = "tf190level3"
  root_id        = "eslz"
  root_name      = "Enterprise-Scale LZ"

  deploy_demo_landing_zones = true
}

@jtracey93
Copy link
Collaborator

jtracey93 commented Oct 14, 2021

Also just tried the same hierarchy above with an SPN that only has RBAC as below:

  • Reader on a subscription (used in provider block) outside of "tf190" hierarchy
  • Owner on "tf190level3"

Still got a correct plan of 191 resources

@jtracey93
Copy link
Collaborator

Deployed with above SPN:
image

@jtracey93
Copy link
Collaborator

Hey @lpmulligan,

So from digging through the module a little deeper tonight we think we may of found the reason why this is happening.

These 2 lines in locals.management_group.tf seem to be of importance for this scenario and its actually looking like its due to the casing used for the input of root_parent_id:

On the line, from the first link, we are building all of the maps for the Management Groups to be created by the module, but we are evaluating if the parent_management_group_id (that we create above in the same file, ready for deployment later) is the same (case sensitive as its Terraform) as root_parent_id.

But in the second link we have some logic that says if the parent_management_group_id is longer than 0 characters in length, which it is, then lowercase the string and replace any non-alphanumeric characters with -'s.

So when combined your root_parent_id of ELZDemo turns into elzdemo and when they are compared, remembering they are case sensitive, they don't match. Which then causes the if statement (first link) to kick in and make the map empty for level_1, which then has a cascading effect down through the other levels.

So what are we going to do...?

Good question, we need to do some further testing of some potential fixes and we also need to just check "why" we implemented the logic to start with an we will then make a PR to fix.

Give us a few days and we will come back to you 👍

Hope that all makes sense, and feel free to reach out with any further questions or queries

Thanks

Jack

(also thanks to @krowlandson for the late night discussion to review my thinking 👍)

@lpmulligan
Copy link
Author

Well shoot. I thought I had it. I think it's something in my environment not related to the code. I just had a co-worker change the MG ID from ELZDemo to eslz-demo and it works. However, your test shows that my hypothesis was wrong which is fine. Something's up with my setup...I don't know where but feel free to close this issue. Thanks for your assistance.

@jtracey93
Copy link
Collaborator

You certainly led us down the right path, so thank you 👍

We will leave this open so we can track the bug fix 👍

If you have another issue, feel free to open a another issue on the repo. Happy to help 👍

@krowlandson
Copy link
Contributor

Well shoot. I thought I had it. I think it's something in my environment not related to the code. I just had a co-worker change the MG ID from ELZDemo to eslz-demo and it works. However, your test shows that my hypothesis was wrong which is fine. Something's up with my setup...I don't know where but feel free to close this issue. Thanks for your assistance.

@lpmulligan - just to add that it may have worked if you just set root_parent_id = "elzdemo" (i.e. all lowercase).

As per @jtracey93's follow up, we have found the point where this is going wrong for you in the code and will work on getting this resolved in a future release. We first need to work out the impact of the fix though, as we may cause unexpected changes if someone has inadvertently used the incorrect casing somewhere within their config and the following is automatically "correcting" this:

parent_management_group_id = try(length(value.parent_management_group_id) > 0, false) ? replace(lower(value.parent_management_group_id), "/[^a-z0-9]/", "-") : local.root_parent_id

We need to be careful to understand whether changing this will cause deployments with incorrect config to fail, or to redeploy Management Groups.

krowlandson pushed a commit that referenced this issue Nov 25, 2021
krowlandson pushed a commit that referenced this issue Nov 26, 2021
* Update Library Templates (automated)

* Update Policy Assignment and `management` logic for new `Deploy_ASCDF_Config` policy (previously `Deploy_ASC_Config`)

* Add control for oss databases (fixes #131)
**BREAKING CHANGE**

* Update planned values for OPA tests

* Update logic for `parent_management_group` field on Management Group configuration (fixes #190)

Co-authored-by: github-actions <action@github.com>
Co-authored-by: Kevin Rowlandson <kevin.rowlandson@microsoft.com>
@ghost ghost locked as resolved and limited conversation to collaborators Dec 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants