Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAS-2214 Add NSIDC IceSAT2 Regression Tests #92

Merged
merged 44 commits into from
Sep 26, 2024

Conversation

flamingbear
Copy link
Member

@flamingbear flamingbear commented Aug 29, 2024

Description

This PR adds a new regression test to the repository.

This is a stop-gap to allow NSIDC to test the basic IceSAT2 Collections against Harmony Services (Primarily the Trajectory Subsetter).

This PR adds:

  • a set of regression tests in the test/nsidc-icesat2/ directory
  • a shared_utility directory so that common basic functionality can be shared across tests. Right now there are a number of basic routines copied into each tests directory, they can be migrated out slowly (or all at once in an IP sprint).
  • github workflow infrastructure to test and build the nsidc-icesat2-image

Jira Issue ID

DAS-2214

Local Test Steps

This can be tested in UAT so in order to test locally, you must clone and check out this branch.

Build the image from the test directory:

make nsidc-icesat2-image

Set your UAT environmental variables

# UAT
export HARMONY_HOST_URL=https://harmony.uat.earthdata.nasa.gov
export AWS_ACCESS_KEY_ID=<UATAWSKEY>
export AWS_SECRET_ACCESS_KEY=<UATAWSSECRETKEY>
export EDL_USER=<edl username>
export EDL_PASSWORD=<edl password>

Run the notebook against UAT

./run_notebooks.sh nsidc-icesat2

Verify it run successfully

output looks like:

Running regression tests
Test suite nsidc-icesat2 starting
running test with ghcr.io/nasa/regression-tests-nsidc-icesat2:latest
Waiting for nsidc-icesat2.
Input Notebook:  nsidc-icesat2/NSIDC-ICESAT2_Regression.ipynb
Output Notebook: /workdir/output/nsidc-icesat2/Results.ipynb
Working directory: nsidc-icesat2
Executing:   0%|          | 0/18 [00:00<?, ?cell/s]Executing notebook with kernel: python3
Executing: 100%|██████████| 18/18 [09:26<00:00, 31.50s/cell]
Test suite nsidc-icesat2 succeeded
Tests completed (passed)

and includes Test suite nsidc-icesat2 succeeded

PR Acceptance Checklist

  • Acceptance criteria met
  • Tests added/updated (if needed) and passing
  • Documentation updated (if needed)

When passed a build arg shared_utils, include this directory in the regression
test image.
Also adds first shapefile for Iceland
This is not technically necessary but will be good for datatree testing.
Updates the notebook for the first suite of tests.
(I'm 99.9% sure this is outdated and unneeded, but deleting in it's own commit
so it can be resurrected if needed)
ATL04 data doesn't work in DataTree because it's unalignable.
I added a new routine compare_results_to_reference_file_new

This needs to be replaced when trajectory subsetter switches over to use this,
but I don't want to do that with an unreleased xarray version from main.
and skeleton of ATL08 Temporal test
(ATL08 showed problems and cannot be run at this time.)
- Adds version file.
- Clarifies shared_utils readme
- removes and reorders utility functions.
This is a workaround because xarray datatree is not released to the public
yet. It can be removed as soon as xarray includes the `open_groups` command by
default.
@flamingbear flamingbear marked this pull request as ready for review August 29, 2024 19:22
Copy link
Contributor

@lyonthefrog lyonthefrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test instructions and the tests passed! My comments are mainly a few questions and a couple minor typos. I'll approve once the questions are resolved.

CHANGELOG.md Show resolved Hide resolved
test/Dockerfile Show resolved Hide resolved
README.md Show resolved Hide resolved
test/shared_utils/README.md Outdated Show resolved Hide resolved
test/shared_utils/README.md Outdated Show resolved Hide resolved
test/nsidc-icesat2/environment.yaml Outdated Show resolved Hide resolved
test/shared_utils/utilities.py Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have this file, should we eventually add a reference to this and remove the shared functions from the existing utility files in the other tests? If so, this makes sense to me since it's good to not re-write the same code in multiple places. But I do worry a little bit about what the best approach would be if one of these functions needed to be edited in the future, since changes could affect tests across a few different services/teams. Would it become a case of pinging the Harmony service provider channel to announce that the shared utilities file has been updated so folks can check that it didn't break anything of theirs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You nailed it. This is a concern and why I waffled on this a bit. But, there are some basic things that all tests need to do and we probably just need to curate the functions in the shared utils carefully.

Copy link
Contributor

@lyonthefrog lyonthefrog Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking an issue that might be encountered is using the comparison function for different file formats, since I know I hit that when trying to add that function to the sdps-egi-tests repository. But it would be nice to have. Would there be any one "owner" of these utils, or set of rules for what should and should not be added? But I'm likely overthinking this and perhaps the answer is that we trust the folks using this repository to be aware that any change may affect the rest - i.e., just be careful, as you said.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think custom comparisons should live in their own test directory. But yes, this is just a be careful spot I think.

@lyonthefrog lyonthefrog self-requested a review September 5, 2024 15:08
flamingbear and others added 2 commits September 9, 2024 14:05
Co-authored-by: Amy Steiker <47193922+asteiker@users.noreply.github.com>
Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you fixed up the pre-commit things that had failed while I was working through this review - nice!

This is nice work, and I like the extraction of things into a shared utility module (although feel your pain about the conditional shenanigans you had to go through in the Dockerfile).

I haven't given this the big thumbs up because I'm cautious about the sizes of some of the reference files. Are there tests that would give NSIDC enough confidence but using smaller spatial/temporal regions?

Oh - also: the tests ran nicely, and I also checked the HOSS image still worked with the updates to the Dockerfile. Nice!

script/test-in-bamboo.sh Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 57MB - is git lfs enabled for this repository? (Similarly, the ATL04 temporal range file is 43.5 MB, and there's another couple at 35 or 25 MB)

150 MB isn't too bad, but this is exactly the sort of thing I'm worried about with dumping a bunch of different data provider tests in the same repository; if all new regression test suites add this much, then the repository is going to get out of hand quickly.

(Edit: I'm not sure exactly how long it took to pull the latest changes locally, but it was in the 10s of seconds or so - but that might also include net2cog related stuff)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git lfs is not enabled for the repository. It may have been at one point, but I could be confusing a different repo.

The two large files were discussed with Mikala and there really isn't a way to make them much smaller. "I'm struggling to get ATL03 output much below 60 MB. The granules are just really huge to begin with."

I do think reference data is going to be a problem with this repository. I haven't thought of a better way, but not sure I want to enable git LFS yet. I haven't used it a lot and to make it efficient I think would take rewriting the git history to remove the files currently stored. I am trying to think of alternatives, like a Data Docker image from a different location, but haven't stumbled into anything yet. In the past I've looked at git-annex and DVC.org. I'm open to options, but I don't really want to add it to this PR. A fresh clone did take 51sec on my machine.

150 MB isn't too bad, but this is exactly the sort of thing I'm worried about with dumping a bunch of different data provider tests in the same repository; if all new regression test suites add this much, then the repository is going to get out of hand quickly.

True, but this is not supposed to be done for all providers, this was stop gap to get NSIDC to production, but I can't find the actual link for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rambling stream-of-consciousness ahoy:

One thought Vinny suggested a while back - could the reference files go somewhere like S3? Then the notebook could either do some in-cloud access, or download the reference files when being run, maybe? Problem with in-cloud access: these notebooks then need to run in us-west-2, which will be problematic as we look to lean on running things in GH (truly rambling: CodeDeploy in Harmony AWS account associated with this repository?). Problem with downloading when running: slow at runtime, rather than slow once when compiling a Docker image.


Another (lower-tech) thought - what's the size of the source file? ATL03 is indeed pretty beefy, but some of the granules are up in the several GB range, while others are more like 300 MB. I wanted to check the ATL03 granule, but don't have access to view CMR records in the NSIDC_CUAT provider, so couldn't.

This led me to:

  1. If these tests are going to be automated, we need to make sure that whichever EDL user is running them (credentials probably set by secrets in a couple of GH environments) has the requisite permissions to view/access/download data and metadata records.

  2. The tests currently only define granules for non-production accounts. Are there production equivalents? Don't we need those? (When I couldn't get at the UAT granule for ATL03, I was going to rip the equivalent from production, but that isn't defined, so I was defeated again!)

Sorry - that's probably about 4 comments rolled into one.

Copy link
Member Author

@flamingbear flamingbear Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did forget the authentication that would be required (I had a note at one point: and it was that I would add the test user to NSIDC's ACLs I think)

As for the size, I did go back and forth with Mikala on the size of the reference files, and I can go back to her, but maybe you should if this is a blocker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did ask Mikala for clarification on why these were the smallest reference files she felt comfortable making. And we will see what she says

@flamingbear
Copy link
Member Author

After talking with @asteiker @owenlittlejohns and @mikala-nsidc. I will update this PR configuring the repository to use github's LFS to keep these large files separate from the code.

@flamingbear
Copy link
Member Author

flamingbear commented Sep 11, 2024

To complete:

  • update repo with GIT LFS and store the reference files there.
  • update pinned version of xarray to latest release
  • clean up docker requirements related to installing git for the previous pin

Clean up Docker hack to install git.
    92a42e6 | * DAS-2214: Adds ATL08 shapefile reference file
    123bc10 | * DAS-2214: Adds ATL09 ATL10 shapefile reference files
    80ae48b | * DAS-2214: Adds ATL13 BBox reference file.
    3bd883b | * DAS-2214: Add reference files Subset bounding box ATL10 and ATL12
    23d6557 | * DAS-2214: Adds ATL06 shapefile reference file.
    ab651f3 | * DAS-2214: Adds reference file for ATL04 temporal range tests
    12edb03 | * DAS-2214: Adds regression files for NSIDC BBox Subset tests.
…sat2

sets default values to false for all other images.
@flamingbear
Copy link
Member Author

flamingbear commented Sep 25, 2024

@owenlittlejohns This is ready for re-review with questions at the bottom

Things I have done:

  • verified that this set of commits added all of the reference files.
    92a42e6 | * DAS-2214: Adds ATL08 shapefile reference file
    123bc10 | * DAS-2214: Adds ATL09 ATL10 shapefile reference files
    80ae48b | * DAS-2214: Adds ATL13 BBox reference file.
    3bd883b | * DAS-2214: Add reference files Subset bounding box ATL10 and ATL12
    23d6557 | * DAS-2214: Adds ATL06 shapefile reference file.
    ab651f3 | * DAS-2214: Adds reference file for ATL04 temporal range tests
    12edb03 | * DAS-2214: Adds regression files for NSIDC BBox Subset tests.

  • reverted them all (6b23cd2)

  • tracked NSIDC's regression test directory with gitlfs.
    ❯ git lfs track "test/nsidc-icesat2/reference_files/*.h5"

  • added the reference files back to the proper location, and committed them and the .gitattributes files that tells git to use lfs for this directory.

  • I am 99% sure that when this is merged new clones will not have to get all of the binary files in their history, and eventually github will clean it up.

  • I have updated the actions so that lfs is a new option to the build-all-images matrix.

  • I also updated the actions to include the build_arg for shared-utils when I found that it wasn't included in my flamingbear fork.

Lastly, I deleted my fork and when I recreated it I can't test these actions anymore because of a login failure. Maybe you can fork and test yours.

In the actions:

You can examine the differences between NSIDC image and any other image and you will see the lfs being used to check out the repository.

Checkout regression tests repository for build-images (all other images)
Run actions/checkout@v3
Syncing repository: flamingbear/harmony-regression-tests
Getting Git version info
Temporarily overriding HOME='/home/runner/work/_temp/9f3bf3d5-b39a-4e61-9403-3ad7f60a95d1' before making global git config changes
Checkout regression test repository for build-images (nsidc-icesat2)
Adding repository directory to the temporary git global config as a safe directory
/usr/bin/git config --global --add safe.directory /home/runner/work/harmony-regression-tests/harmony-regression-tests
Deleting the contents of '/home/runner/work/harmony-regression-tests/harmony-regression-tests'
Initializing the repository
Disabling automatic garbage collection
Setting up auth
Fetching the repository
Determining the checkout info
Checking out the ref
/usr/bin/git log -1 --format='%H'


Checkout regression test repository for build-images (nsidc-icesat2)
Run actions/checkout@v3
Syncing repository: flamingbear/harmony-regression-tests
Getting Git version info
Temporarily overriding HOME='/home/runner/work/_temp/7161e340-c13e-4205-a677-e1b79c3ca60b' before making global git config changes
Adding repository directory to the temporary git global config as a safe directory
/usr/bin/git config --global --add safe.directory /home/runner/work/harmony-regression-tests/harmony-regression-tests
Deleting the contents of '/home/runner/work/harmony-regression-tests/harmony-regression-tests'
Initializing the repository
Disabling automatic garbage collection
Setting up auth
***** /usr/bin/git lfs install --local
Updated Git hooks.
**** Git LFS initialized.
Fetching the repository
Determining the checkout info
***** Fetching LFS objects
Checking out the ref
/usr/bin/git log -1 --format='%H'

You should also be able to see the --build_arg shared_util=true for the nsidc-icesat2 docker build, but not any of the other images. they should be false.

@flamingbear
Copy link
Member Author

Lastly. I did fix my fork build and you can examine the images and actions https://github.com/flamingbear/harmony-regression-tests/actions/runs/11034514374
I build HOSS and NSIDC-ICESAT2.

You can run the images by setting up your UAT env and then override the default images:

export HARMONY_HOST_URL=https://harmony.uat.earthdata.nasa.gov
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export REGRESSION_TESTS_NSIDC_ICESAT2_IMAGE=ghcr.io/flamingbear/regression-tests-nsidc-icesat2:latest
export REGRESSION_TESTS_HOSS_IMAGE=ghcr.io/flamingbear/regression-tests-hoss:latest
./run_notebooks.sh hoss nsidc-icesat2

You should see them running with the new images

 ./run_notebooks.sh hoss nsidc-icesat2
Running regression tests
Test suite hoss starting
running test with ghcr.io/flamingbear/regression-tests-hoss:latest
Test suite nsidc-icesat2 starting
running test with ghcr.io/flamingbear/regression-tests-nsidc-icesat2:latest

They should run and pass against UAT.

README.md Outdated Show resolved Hide resolved
Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the git-lfs changes all make sense. I re-cloned the repo (which was a bit slow, because the commits haven't been squashed yet), and the tests continue to run after rebuilding the image (although, they did take 37 minutes, which I'm a bit wary about - maybe I hit UAT at a bad time?).

The example runs of the workflow in your fork also look good.

Thanks for persisting with this!

@flamingbear
Copy link
Member Author

(although, they did take 37 minutes, which I'm a bit wary about - maybe I hit UAT at a bad time?).

I think the nsidc-icesat2 tests are a bit slow, and since nothing is running in parallel it just adds onto the total time.

@flamingbear flamingbear merged commit 2c83754 into main Sep 26, 2024
1 check passed
@flamingbear flamingbear deleted the mhs/DAS-2214/nsidc-regressions branch September 26, 2024 03:22
@flamingbear
Copy link
Member Author

@owenlittlejohns Yup, I checked and nsidc's runs in about 10min.

error	26-Sep-2024 09:09:17	
error	26-Sep-2024 09:09:17	Executing:   0%|          | 0/19 [00:00<?, ?cell/s]Executing notebook with kernel: python3
error	26-Sep-2024 09:09:17	
error	26-Sep-2024 09:09:17	Executing:   5%|▌         | 1/19 [00:09<02:46,  9.24s/cell]
error	26-Sep-2024 09:09:17	Executing:  26%|██▋       | 5/19 [00:14<00:34,  2.44s/cell]
error	26-Sep-2024 09:09:17	Executing:  37%|███▋      | 7/19 [00:19<00:30,  2.51s/cell]
error	26-Sep-2024 09:09:17	Executing:  47%|████▋     | 9/19 [00:20<00:18,  1.85s/cell]
error	26-Sep-2024 09:09:17	Executing:  63%|██████▎   | 12/19 [05:48<05:35, 47.96s/cell]
error	26-Sep-2024 09:09:17	Executing:  79%|███████▉  | 15/19 [06:49<02:29, 37.47s/cell]
error	26-Sep-2024 09:09:17	Executing:  95%|█████████▍| 18/19 [09:35<00:43, 43.69s/cell]
error	26-Sep-2024 09:09:17	Executing: 100%|██████████| 19/19 [09:38<00:00, 30.43s/cell]
build	26-Sep-2024 09:09:17	�[0;32mTest suite nsidc-icesat2 succeeded�[0m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants