Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/discover-interface #61

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

cohen-seth
Copy link
Collaborator

This PR adds the changes to multiple scripts which allow the observation-inventory-utils to work on Discover (NASA). AWS and HPSS functionality are preserved despite these edits:

(1) Two new files have been added: obs_inv_utils_discover.csh, discover_interface.py.
(2) Changes were made to 5 scripts: nceplibs_bufr_cmd_handler.py, obs_inv_queries.py, obs_storage_platforms.py, search_engine.py

Files for testing the changes above have also been added:
(3) Four test yaml files have also been added: test_discover.sh, test_aws_amsua_1.yaml, test_aws_amsua_2.yaml, test_discover_amsua_1.yaml, test_discover_amsua_2.yaml.
(4) A .sh file, test_discover.sh, that runs the appropriate obs_inv_cli commands for all the new yamls for faster testing on discover.

@cohen-seth cohen-seth added enhancement New feature or request nasa-discover labels Sep 19, 2024
@cohen-seth cohen-seth self-assigned this Sep 19, 2024
Copy link
Collaborator

@jrknezha jrknezha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big PR so this is just a first pass review, once we get through some of the code parts I'll make sure to run everything as part of my final review. For now, there's a few areas that I've commented on that need to be changed to be more robust / follow existing patterns / software best practices.

@@ -23,6 +23,8 @@
from obs_inv_utils.subprocess_cmd_handler import SubprocessCmd
from obs_inv_utils import nceplibs_cmds as nc_cmds

import yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this import should be grouped above with the others, under uuid


work_dir = os.path.join(self.meta_config.work_dir, temp_uuid)
# Because s3 bucket bufr files require download and new work_dir and discover files do not they need to be handled differently.
if bucket == 'noaa-reanalyses-pds':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the appropriate way to do this. We should not be hard coding in the bucket, especially knowing we need to be able to support the private s3 buckets as well in the future.

If we have to hard code something, it should be the check for discover as that is a much more specific use case.

work_dir = os.path.join(self.meta_config.work_dir, temp_uuid)
# Because s3 bucket bufr files require download and new work_dir and discover files do not they need to be handled differently.
if bucket == 'noaa-reanalyses-pds':
print(f'Running get_bufr_file_meta for NOAA S3 Clean Bucket (aws_s3) {self.s3_bucket}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be referring to it as Clean Bucket, but NNJA. However, this should be much more generalized for any possible s3 bucket

os.remove( saved_filename )
shutil.rmtree( work_dir )

elif self.s3_bucket == None or self.s3_bucket == '':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using a different way to check directly for Discover such as platform

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a note for myself to look into: how is HPSS called and would that not be prevented with this elif statement?

@@ -0,0 +1,11 @@
s3_bucket: noaa-reanalyses-pds
s3_prefix: observations/reanalysis/gps/gpsro/%Y/%m/bufr/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the file name says amsua but it's pointing to gps data? need to make sure the file name of the file and the bufr files is consistent with the s3_prefix

python src/obs_inv_utils/obs_inv_cli.py get-obs-inventory -c src/tests/yaml_configs/test_aws_amsua_1.yaml ;
python src/obs_inv_utils/obs_inv_cli.py get-obs-count-meta-sinv -c src/tests/yaml_configs/test_discover_amsua_2.yaml ;

echo "test complete."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not actually a test but a chance to run some files, it won't actually check and confirm things are working. Since we are already using pytest for other testing, it would be appropriate to write this into a python file using pytest and assert statements for confirmation that the code is working appropriately

],
)

HpssTarballContents = namedtuple(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this actually getting used in this file or was it just there for reference making the discover ones? if just reference they can be removed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request nasa-discover
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants