Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enable-maintenance-reservation flag in slurm to control reservation for scheduled maintenance #2987

Merged

Conversation

harshthakkar01
Copy link
Contributor

@harshthakkar01 harshthakkar01 commented Aug 30, 2024

This PR adds enable_reservation_maintenance flag at nodeset level. Slurm would look at nodeset config and decide whether to reserve node for scheduled maintenance or not.

Also, this would ignore running jobs if the it overlaps with the maintenance window and make slurm reservation for maintenance anyway.

For exameple,

  - id: debug_nodeset
    source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
    use: [network]
    settings:
      node_count_dynamic_max: 0
      node_count_static: 1
      machine_type: n2-standard-2
      enable_placement: false # the default is: true
      allow_automatic_updates: false
      enable_maintenance_reservation: true

  - id: debug_partition
    source: community/modules/compute/schedmd-slurm-gcp-v6-partition
    use:
    - debug_nodeset
    settings:
      partition_name: debug
      exclusive: false # allows nodes to stay up after jobs are done
      is_default: true

  - id: debug2_nodeset
    source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
    use: [network]
    settings:
      node_count_dynamic_max: 0
      node_count_static: 1
      machine_type: n2-standard-2
      enable_placement: false # the default is: true
      allow_automatic_updates: false
      enable_maintenance_reservation: false

  - id: debug2_partition
    source: community/modules/compute/schedmd-slurm-gcp-v6-partition
    use:
    - debug2_nodeset
    settings:
      partition_name: debug2
      exclusive: false # allows nodes to stay up after jobs are done
      is_default: true

Submission Checklist

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@harshthakkar01 harshthakkar01 changed the title Add enable-maintenance-reservation flag in slurm to control reservation Add enable-maintenance-reservation flag in slurm to control reservation for scheduled maintenance Aug 30, 2024
@harshthakkar01 harshthakkar01 self-assigned this Sep 4, 2024
@harshthakkar01 harshthakkar01 added the release-key-new-features Added to release notes under the "Key New Features" heading. label Sep 4, 2024
@harshthakkar01 harshthakkar01 marked this pull request as ready for review September 4, 2024 23:03
@mr0re1 mr0re1 assigned harshthakkar01 and unassigned mr0re1 Sep 5, 2024
@harshthakkar01 harshthakkar01 force-pushed the slurm-maintenance branch 3 times, most recently from e078713 to 1c8ba1f Compare September 5, 2024 18:27
@mr0re1 mr0re1 assigned harshthakkar01 and unassigned mr0re1 Sep 5, 2024
@harshthakkar01 harshthakkar01 merged commit 9a781df into GoogleCloudPlatform:develop Sep 6, 2024
8 of 52 checks passed
@harshthakkar01 harshthakkar01 deleted the slurm-maintenance branch September 6, 2024 00:41
@tpdownes tpdownes mentioned this pull request Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-key-new-features Added to release notes under the "Key New Features" heading.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants