Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Error Message with Reservation Validation #3174

Merged

Conversation

arajmane-g
Copy link
Contributor

@arajmane-g arajmane-g commented Oct 28, 2024

About the Change

We already have validation on the reservation usage in GKE clusters. However, that error message can improved.

With this PR, the error message provides more information about which properties mismatch, the values configured on the reservation and the node pool, the relevant node pool settings, their defaults. So that, the user knows what is wrong and they can make corrections to the blueprint accordingly.

This PR, also, adds README instructions about using the reservations.

Tests

Manually tested different scenarios:

  • no reservation: terraform plan operation succeeds
  • any reservation: terraform plan operation succeeds
  • valid specific reservation: terraform plan operation succeeds
  • specific reservation with mismatches: terraform plan fails as expected; the error message includes the messages about the properties that mismatched, the values configured on the reservation and the node pool, the relevant node pool settings, their defaults.

For a concrete example, consider a scenario where the reservation has 1 nvidia-l4 accelerator, 0 disks and the machine type is g2-standard-4. Whereas, the node pool config from the blueprint is as shown below

  - id: g2_pool
    source: modules/compute/gke-node-pool
    use: [gke_cluster, gke_service_account]
    settings:
      machine_type: g2-standard-24
      local_ssd_count_nvme_block: 16
      reservation_affinity:
        consume_reservation_type: SPECIFIC_RESERVATION
        specific_reservations:
        - name: specific-reservation-1

The error message will look like the following as all of the machine type, local ssds and guest accelerators settings mismatch.

Check if your reservation is configured correctly:
- A reservation with the name must exist in the specified project and one of
the specified zones

- Its consumption type must be "specific"

- The reservation has {"nvidia-l4":1} accelerators and the node pool has
{"nvidia-l4":2}. Check the relevant node pool setting: "guest_accelerator".
When unspecified, for the machine_type=g2-standard-24, the default is
guest_accelerator=[{"count":2,"type":"nvidia-l4"}].


- The reservation has {"NVME":0,"SCSI":0} local SSDs and the node pool has
{"NVME":16,"SCSI":0}. Check the relevant node pool settings:
{local_ssd_count_ephemeral_storage, local_ssd_count_nvme_block}. When
unspecified, for the machine_type=g2-standard-24 the defaults are:
{local_ssd_count_ephemeral_storage=0, local_ssd_count_nvme_block=0}.


- The reservation has "g2-standard-4" machine type and the node pool has
"g2-standard-24". Check the relevant node pool setting: "machine_type"

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@arajmane-g arajmane-g marked this pull request as draft October 28, 2024 10:23
@arajmane-g arajmane-g changed the title Improve Error Message with Reservation Validation [DNR] [DNM] Improve Error Message with Reservation Validation Oct 28, 2024
@arajmane-g arajmane-g force-pushed the improve_error_msg branch 4 times, most recently from 87cdd67 to eb1c985 Compare October 29, 2024 11:46
@arajmane-g arajmane-g changed the title [DNR] [DNM] Improve Error Message with Reservation Validation Improve Error Message with Reservation Validation Oct 29, 2024
@arajmane-g arajmane-g marked this pull request as ready for review October 29, 2024 12:12
@arajmane-g arajmane-g added the release-module-improvements Added to release notes under the "Module Improvements" heading. label Oct 29, 2024
@arajmane-g arajmane-g merged commit 0281993 into GoogleCloudPlatform:develop Nov 5, 2024
14 of 61 checks passed
@arajmane-g arajmane-g deleted the improve_error_msg branch November 5, 2024 09:59
@rohitramu rohitramu mentioned this pull request Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-module-improvements Added to release notes under the "Module Improvements" heading.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants