Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hpc-enterprise-slurm gpu fix #1691

Merged
merged 2 commits into from
Aug 16, 2023

Conversation

cboneti
Copy link
Member

@cboneti cboneti commented Aug 16, 2023

This fixes partition and node configurations for the GPU marchines in the hpc-enterprise-slurm.yaml blueprint.
Specifically, this fixes:
Sockets: 2
CoresPerSocket: 24

For the node groups, and then sets:
DefMemPerGPU: 160000
DefMemPerCPU: null

For the partitions. The rationale is that a lot of the GPU users would prefer not specifying number of cpus per jobs.

Copy link
Member

@tpdownes tpdownes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Please do following before merging.

  • add an appropriate label
  • allow integration test to complete

@tpdownes tpdownes assigned cboneti and unassigned tpdownes Aug 16, 2023
@cboneti cboneti added the release-improvements Added to release notes under the "Improvements" heading. label Aug 16, 2023
@cboneti cboneti assigned cboneti and unassigned cboneti Aug 16, 2023
@tpdownes tpdownes self-requested a review August 16, 2023 19:29
@tpdownes
Copy link
Member

@cboneti can you please rebase off the current state of develop? I'd like to see the "PR-test-hpc-enterprise-slurm" test pass. I think you branched off it before #1679 was merged.

@cboneti
Copy link
Member Author

cboneti commented Aug 16, 2023

@cboneti can you please rebase off the current state of develop? I'd like to see the "PR-test-hpc-enterprise-slurm" test pass. I think you branched off it before #1679 was merged.

I rebased it now and triggered the enterprise slurm test again. Thanks

@cboneti cboneti added the release-bugfix Added to release notes under the "Bug fixes" heading. label Aug 16, 2023
@cboneti cboneti merged commit cf36120 into GoogleCloudPlatform:develop Aug 16, 2023
@cboneti cboneti deleted the enterprise-gpu-fix branch August 16, 2023 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-bugfix Added to release notes under the "Bug fixes" heading. release-improvements Added to release notes under the "Improvements" heading.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants