Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various CCPP PRs. Improve and update CI. #541

Merged
merged 20 commits into from
Apr 28, 2021

Conversation

MinsukJi-NOAA
Copy link
Contributor

@MinsukJi-NOAA MinsukJi-NOAA commented Apr 22, 2021

PR Checklist

  • Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • If new or updated input data is required by this PR, it is clearly stated in the text of the PR.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

This PR includes:

Issue(s) addressed

NCAR/ccpp-physics#641
#539

Testing

Regression testing on tier-1 platforms using the newly generated baselines:

  • hera.intel
  • hera.gnu
  • orion.intel - note: skip for this PR due to maintenance
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss_cray
  • wcoss_dell_p3

Testing of the actions workflow runs can be seen here and here.

Note also that @climbfuji compared the new baselines (20210427) on Hera for Intel and GNU to the previous ones (20210426) for the final PR. Only the tests described in NOAA-EMC/fv3atm#291 (comment) are different, as expected.

Dependencies

* Better ec2 instance control in aux.yaml
* Automatically check if PR branch is up to date with authoritative repos
* Build/test matrix change to allow more than one test type in the future
* Add cleanup of docker container/image before start testing
* Delete files related to skip-ci and automatic cancellation of jobs
* run-ci
@MinsukJi-NOAA MinsukJi-NOAA added enhancement New feature or request Waiting for Reviews The PR is waiting for reviews from associated component PR's. labels Apr 23, 2021
@BrianCurtis-NOAA
Copy link
Collaborator

Are you doing something unique to check for run-ci in the commit message? Would using something like
if: "contains(github.event.head_commit.message, 'run-ci')"
only run the jobs if they see the commit message with run-ci ?

@MinsukJi-NOAA
Copy link
Contributor Author

MinsukJi-NOAA commented Apr 23, 2021

Are you doing something unique to check for run-ci in the commit message? Would using something like
if: "contains(github.event.head_commit.message, 'run-ci')"
only run the jobs if they see the commit message with run-ci ?

github.event.head_commit.message context is only available for push event, and not for pull_request event. I guess I can maybe simplify the push part of the if-elif-fi statement. I will check.

@aerorahul
Copy link
Contributor

a quick glance at the python script:

  • standard indents are 4 spaces, not 2.
  • this script will fail coding norms when using pycodestyle
  • autopep8 -i script.py will fix all coding norm issues.

* Add job dependency in aux.yml
* Rename repo-check.sh to repo_check.sh
* run-ci
@aerorahul
Copy link
Contributor

I see unlimited while loops in the python script. What happens if the it never completes or fails to return a valid response?

@MinsukJi-NOAA
Copy link
Contributor Author

MinsukJi-NOAA commented Apr 23, 2021

I see unlimited while loops in the python script. What happens if the it never completes or fails to return a valid response?

CI job will have to be manually canceled on a case-by-case basis. It is hard to put a time limit that will automatically stop the loops because it all depends on the number and status of jobs in the queue and in progress. If not manually stopped, it will be cut off by the time limit of an actions job, which is 6 hours by default. I guess I could put in the six hours in the script. If you have a suggestion, please let me know.

@aerorahul
Copy link
Contributor

I see unlimited while loops in the python script. What happens if the it never completes or fails to return a valid response?

CI job will have to be manually canceled on a case-by-case basis. It is hard to put a time limit that will automatically stop the loops because it all depends on the number and status of jobs in the queue and in progress. If you know of a way, please let me know.

I don't know at the moment, but I have not thought about this.
It does however defeat the purpose of a CI/CD system if manual intervention is necessary.
I can think on it, but go ahead with the PR.

@MinsukJi-NOAA
Copy link
Contributor Author

I don't know at the moment, but I have not thought about this.
It does however defeat the purpose of a CI/CD system if manual intervention is necessary.
I can think on it, but go ahead with the PR.

I agree it's not a perfect system. I have some ideas, and they will be experimented with, tested, and implemented in the future.

@MinsukJi-NOAA
Copy link
Contributor Author

MinsukJi-NOAA commented Apr 23, 2021

@DusanJovic-NOAA , @BrianCurtis-NOAA, @aerorahul , could you please help me conduct one more testing by forking my repository (MinsukJi-NOAA:feature/CI-update) and making 2 pull requests per person? 1st pull request with run-ci in the commit message, and 2nd pull request without run-ci. Thanks.

@aerorahul
Copy link
Contributor

@MinsukJi-NOAA
Have you looked at some of these actions?
https://github.com/marketplace/actions/wait-for-check

@MinsukJi-NOAA
Copy link
Contributor Author

Thanks for testing @DusanJovic-NOAA. It was successful: https://github.com/MinsukJi-NOAA/ufs-weather-model/actions

* Revert back parameters used to test in a personal repository
* Change number of instances from 2 (testing) to 6 (ufs-weather-model)
@MinsukJi-NOAA
Copy link
Contributor Author

This PR will not pass the CI test because it contains changes for the CI test scripts.

@MinsukJi-NOAA MinsukJi-NOAA changed the title Improve and update CI Various CCPP PRs: change vertical dimension and long name of dku/dkt, UGWP bugfixes, LTP bugfixes, sfcsub update, framework update. Improve and update CI. Apr 27, 2021
@MinsukJi-NOAA MinsukJi-NOAA changed the title Various CCPP PRs: change vertical dimension and long name of dku/dkt, UGWP bugfixes, LTP bugfixes, sfcsub update, framework update. Improve and update CI. Various CCPP PRs. Improve and update CI. Apr 27, 2021
@MinsukJi-NOAA MinsukJi-NOAA added the Baseline Updates Current baselines will be updated. label Apr 27, 2021
@BrianCurtis-NOAA
Copy link
Collaborator

This PR will not pass the CI test because it contains changes for the CI test scripts.

Im just curious, if the CI changes are in this PR, does it not use these files when it launches the CI?

@MinsukJi-NOAA
Copy link
Contributor Author

MinsukJi-NOAA commented Apr 28, 2021

This PR will not pass the CI test because it contains changes for the CI test scripts.

Im just curious, if the CI changes are in this PR, does it not use these files when it launches the CI?

The workflow triggered by pull_request (build_test.yml) will use the new file. However, the workflow it triggers (aux.yml) uses the file in the base repository. This is for security reasons as it has the write permission to the repository as well as access to secrets. The CI test does not work because the two workflows work in tandem.

@BrianCurtis-NOAA
Copy link
Collaborator

This PR will not pass the CI test because it contains changes for the CI test scripts.

Im just curious, if the CI changes are in this PR, does it not use these files when it launches the CI?

The workflow triggered by pull_request (build_test.yml) will use the new file. However, the workflow it triggers (aux.yml) uses the file in the base repository. This is for security reasons as it has the write permission to the repository as well as access to secrets. The CI test does not work because the two workflows work in tandem.

Thanks for the info. Much appreciated.

@climbfuji
Copy link
Collaborator

Since the CI tests cannot pass as @MinsukJi-NOAA explained and all regression tests except orion.intel passed (which was under maintenance for most of the day), we can consider these PRs as ready to merge. I did start auto-bl on orion, and given how late it is we can just wait to tomorrow = Tuesday morning to do the merges.

@BrianCurtis-NOAA
Copy link
Collaborator

Machine: orion
Compiler: intel
Job: BL
Repo location: /work/noaa/nems/emc.nemspara/autort/pr/621511026/20210427194528/ufs-weather-model
Please make changes and add the following label back:
orion-intel-BL

@climbfuji
Copy link
Collaborator

Machine: orion
Compiler: intel
Job: BL
Repo location: /work/noaa/nems/emc.nemspara/autort/pr/621511026/20210427194528/ufs-weather-model
Please make changes and add the following label back:
orion-intel-BL

I checked the orion log, one test failed while creating the baseline due to an MPI timeout (a system issue as so often on orion). Let's proceed with the merge, in the meanwhile I will try to complete this manually on orion.

@MinsukJi-NOAA
Copy link
Contributor Author

CCPP pointers are not updated. Let me try again.

@climbfuji
Copy link
Collaborator

CCPP pointers are not updated. Let me try again.

You need to update your fv3atm submodule, it needs to point to hash 28888f0 in the NOAA-EMC repository.

@MinsukJi-NOAA
Copy link
Contributor Author

OK. It's done and ready for merge.

@MinsukJi-NOAA MinsukJi-NOAA added the Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. label Apr 28, 2021
Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fv3atm submodule pointer is correct

@climbfuji
Copy link
Collaborator

@DusanJovic-NOAA please review and merge if ok - thank you!

@DusanJovic-NOAA DusanJovic-NOAA merged commit 2c34b82 into ufs-community:develop Apr 28, 2021
@climbfuji
Copy link
Collaborator

Note. I was able to create the 20210427 baseline for this PR on Orion retrospectively and I started to verify against it, but then the /work filesystem died again. At least the baseline is there (if it is ever needed).

pjpegion pushed a commit to NOAA-PSL/ufs-weather-model that referenced this pull request Apr 4, 2023
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
* Remove echo from script

* Add path to result files

* Add a new WE2E test for inline post in nco mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. enhancement New feature or request Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. Waiting for Reviews The PR is waiting for reviews from associated component PR's.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants