Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add #SBATCH -c 1 to Harvard Cannon GCHP run scripts and integration tests; Also allow scheduler none in integration tests #2182

Merged

Conversation

yantosca
Copy link
Contributor

@yantosca yantosca commented Mar 6, 2024

Name and Institution (Required)

Name: Bob Yantosca
Institution: Harvard + GCST

Confirm you have reviewed the following documentation

Describe the update

This PR does the following:

  1. Adds #SBATCH -c 1 to Harvard Cannon integration test scripts and GCHP run scripts. This is a necessary hotfix to allow GCHP run and integration test jobs to be submitted from within a SLURM interactive session. A recent update to SLURM on the Harvard Cannon introduced this new behavior ("It's a feature, not a bug!")

  2. Implements the feature request raised by @lizziel in issue [FEATURE REQUEST] Require scheduler entry in integration tests #2147. We now allow integration tests to use -s none to set the scheduler to "none". In this case, this will invoke compile-only tests. The user will be queried whether to accept or reject this. Also the usage message has been updated to show the potential values of the -s argument.

Expected changes

  1. This will allow GCHP jobs to be submitted to the SLURM scheduler on the Harvard Cannon cluster.
  2. Integration tests will now run compile-only tests when called with -s none.

Related Github Issue(s)

yantosca added 2 commits March 6, 2024 15:42
A recent SLURM update on the Harvard Cannon cluster now requires MPI
jobs (such as GCHP) to specify the "#SBATCH -c 1" (which sets the
SLURM_TRES_PER_TASK=cpu:1.  Otherwise, an error will occur when
submitting the batch job from an interactive session, as the value of
SLURM_TRES_PER_TASK will be copied from the interactive session, which
will prevent the batch job from starting.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
test/integration/GCHP/integrationTestExecute.sh
- Add "#SBATCH -c 1" tag for SLURM scheduler.  This is needed after a
  recent SLURM update on the Harvard Cannon cluster.

test/integration/GCClassic/integrationTest.sh
test/integration/GCHP/integrationTest.sh
- Add IF block to ask user if they would like to proceed with
  compile-only tests when scheduler = "none"
- Update usage message to denote which schedulers are allowed
  ("SLURM|LSF|none")
- Remove "#SBATCH -c 1" from integrationTestExecute.sh when
  scheduler is "LSF"

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca yantosca added topic: Benchmarking and Testing Related to CI, integration tests, or scientific benchmarking no-diff-to-benchmark This update will not change the results of fullchem benchmark simulations category: Bug Fix Fixes a previously-reported bug labels Mar 6, 2024
@yantosca yantosca added this to the 14.3.1 milestone Mar 6, 2024
@yantosca yantosca requested review from lizziel and msulprizio March 6, 2024 21:01
@yantosca yantosca self-assigned this Mar 6, 2024
@msulprizio msulprizio changed the base branch from main to dev/no-diff-to-benchmark March 7, 2024 14:23
@yantosca
Copy link
Contributor Author

yantosca commented Mar 7, 2024

All GCClassic integration tests passed:

==============================================================================
GEOS-Chem Classic: Execution Test Results

GCClassic #df7ad84 GEOS-Chem submod update: Merge PR #2177 (ObsPack bug fixes )
GEOS-Chem #d95a1bd48 Add "#SBATCH -c 1" to GCHP integration test scripts (for SLURM)
HEMCO     #f807e1a Update HEMCO version number to 3.8.0 in preparation for release

Using 24 OpenMP threads
Number of execution tests: 26

Submitted as SLURM job: 22670182
==============================================================================

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%  All execution tests passed!  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Furthermore, all integration tests were zero-diff w/r/t 14.3.0, except

  • APM (known parallelization issue)
  • RRTMG (numerical noise in RRTMG collection)

Copy link
Contributor

@lizziel lizziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! My feature request also asked for scheduler to be a required arguement when running integration tests. Otherwise you can forget to add it and then wonder why only compile tests were run. We can discuss that more on the feature request page. If we put it in it does not need to be part of this PR.

@yantosca
Copy link
Contributor Author

yantosca commented Mar 7, 2024

Looks good! My feature request also asked for scheduler to be a required arguement when running integration tests. Otherwise you can forget to add it and then wonder why only compile tests were run. We can discuss that more on the feature request page. If we put it in it does not need to be part of this PR.

Thanks @llzziel. So I can add another if block for scheduler == none and then error out as the default. Sorry I missed that!

test/integration/GCClassic/integrationTest.sh
test/integration/GCHP/integrationTest.sh
- Remove the default 'scheduler="none"' setting.  We now rely on
  the value passed via the arg list
- Add note that scheduler is converted to uppercase (this was always
  the default behavior but not documented)
- Add if block to exit with usage message if scheduler is omitted.
- Add comment to state that compile-only tests will run when
  scheduler is "none".

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca
Copy link
Contributor Author

yantosca commented Mar 7, 2024

@lizziel, @msulprizio: We now exit if scheduler is not passed, see commit b93fe8c.

@yantosca yantosca merged commit 874f840 into dev/no-diff-to-benchmark Mar 7, 2024
@yantosca yantosca deleted the bugfix/harvard-cannon-add-sbatch-c-1 branch March 7, 2024 18:14
yantosca added a commit that referenced this pull request Mar 7, 2024
This was done in order to bring the "#SBATCH -c 1" updates into the
PR #2178 branch.  Otherwise we would not be able to run GCHP integration
tests.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Fix Fixes a previously-reported bug no-diff-to-benchmark This update will not change the results of fullchem benchmark simulations topic: Benchmarking and Testing Related to CI, integration tests, or scientific benchmarking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE REQUEST] Require scheduler entry in integration tests
3 participants