Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

om CLI will use an incorrect guid when running configure-director if two vSphere clusters under an AZ have the same cluster name. #557

Open
ystros opened this issue Aug 24, 2021 · 3 comments

Comments

@ystros
Copy link
Contributor

ystros commented Aug 24, 2021

Overview

On a vSphere environment, each AZ can have multiple clusters defined underneath it. The clusters have 3 properties that define its uniqueness - cluster, resource_pool, and host_group. You can have multiple clusters that use the same cluster name, as long as the resource_pool or host_group differs between the two. e.g.

az-configuration:
- name: puff-first-az
  iaas_configuration_name: default
  clusters:
  - cluster: ops_manager_cluster
    drs_rule: MUST
    host_group: ""
    resource_pool: ""
  - cluster: ops_manager_cluster
    drs_rule: MUST
    host_group: ""
    resource_pool: puff1

The om CLI attempts to add in the guid property for each cluster by using the /api/v0/staged/director/availability_zones Ops Manager API endpoint. This ensures that the payload sent to the update AZ API endpoint is matched up with the existing AZ and cluster definitions. This is necessary because the fields are locked after BOSH + associated products are deployed, and Ops Manager protects against deletions / modifications to the AZs + clusters with an error like:

Cannot modify the cluster 'ops_manager_cluster' in the availability zone 'puff-first-az' of a deployed product

However, the logic om CLI uses to look up the existing cluster only considers the cluster property, which may not be unique within a given AZ:

om/api/director_service.go

Lines 485 to 488 in ca9f0f8

if cluster.Name == existingCluster.Name {
cluster.GUID = existingCluster.GUID
break
}

In examples like the above, this will result in om reusing the same guid for two different clusters. The Ops Manager API does not currently prevent this (story to fix here: https://www.pivotaltracker.com/story/show/179348373). Once in this state, any attempts to modify the AZ definition, either in the Ops Manager UI or using the om CLI will result in the previously mentioned 'Cannot modify the cluster ...' error.

Once the API is updated to properly prevent using the same GUID for two different clusters, the om CLI will begin returning an error if this state is reached.

Reproduction steps

  1. Configure Ops Manager using om configure-director --config director-config.yml
  2. Apply Changes
  3. Update director-config.yml to include a new cluster to the AZ that has the same cluster name, but a different resource_pool or host_group property than the original cluster.
  4. Use om configure-director again to update the config in Ops Manager
  5. Use om staged-director-config to get the latest config from Ops Manager. You will see the same guid defined from both clusters.

Workaround

There is no known workaround, other than using different cluster names (which is likely not possible since these are defined at the vSphere layer and would require vSphere configuration changes). Adding guid to the director config YML file does not seem to help, since the code to look up and assign guid always runs as part of the om configure-director command.

@cf-gitbot
Copy link
Member

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

@jaristiz
Copy link
Contributor

jaristiz commented Sep 8, 2021

HI @ystros

There is an existing PR #559 with the change, could you be able to check it out if it fixes your problem?

@dtimm
Copy link
Contributor

dtimm commented Jun 13, 2022

@ystros Hey Brian,

Did #559 work to resolve this issue for you?

markstokan added a commit that referenced this issue Nov 9, 2022
* cherry-picking code from forked repo
* #559

Co-authored-by: Mark Stokan <stokanm@vmware.com>
Co-authored-by: claire tinati <ctinati@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants