-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/discover-interface #61
Open
cohen-seth
wants to merge
5
commits into
NOAA-PSL:develop
Choose a base branch
from
cohen-seth:develop
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
b741574
feature/discover-interface edits
cohen-seth 0925fa7
feature/discover-interface edits
cohen-seth 83d951c
feature/discover-interface nceplibs_bufr_cmd_handler edits and discov…
cohen-seth 8fc9d14
test_discover wrkdir edits
cohen-seth 76497c2
test_discover.sh
cohen-seth File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
#!/usr/local/bin/csh | ||
|
||
# Load obs_inv_util reworked for NASA module paths | ||
# ------------------------------ | ||
# The purpose of this script is to 1) load python anaconda env | ||
|
||
# ------------------------------------------------- | ||
# Loading NOAA Tool ~ observation-inventory-utils | ||
# ------------------------------------------------- | ||
|
||
|
||
# NASA DISCOVER | ||
module purge | ||
module load comp/intel/18.0.5.274 | ||
module load mpi/impi/18.0.5.274 | ||
module load aws/2 | ||
module load python/GEOSpyD/Min23.5.2-0_py3.11 | ||
|
||
# Environment Variables ~ required (?) by obs_inv_utils | ||
setenv OBS_INV_HOME_DIR $PWD | ||
setenv OBS_INV_SRC $OBS_INV_HOME_DIR/src | ||
setenv PYTHONPATH $OBS_INV_HOME_DIR/src:$PYTHONPATH | ||
setenv PATH $NOBACKUP/workenv/NCEPLIBS-bufr/NCEPLIBS-bufr-bufr_v12.0.0/utils:$PATH | ||
|
||
|
||
# Virtual environment | ||
source $NOBACKUP/venvs/ObsInvEnv/bin/activate.csh | ||
|
||
echo PYTHONPATH=$PYTHONPATH |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,293 @@ | ||
|
||
# The purpose of this script is to 1) allow obs_inv_utils to inventory files from NASA Discover | ||
|
||
|
||
import os | ||
import subprocess | ||
import re | ||
from pathlib import Path | ||
from collections import namedtuple, OrderedDict | ||
import attr | ||
from datetime import datetime | ||
|
||
nl = '\n' | ||
|
||
# DiscoverCommand adapted from hpss_io_interface.HpssCommand | ||
DiscoverCommand = namedtuple( | ||
'DiscoverCommand', | ||
[ | ||
'command', | ||
'arg_validator', | ||
'output_parser' | ||
], | ||
) | ||
|
||
# DiscoverCommandRawResponse adapted from hpss_io_interface.HppsCommandRawResponse | ||
DiscoverCommandRawResponse = namedtuple( | ||
'DiscoverCommandRawResponse', | ||
[ | ||
'command', | ||
'return_code', | ||
'error', | ||
'output', | ||
'success', | ||
'args_0', | ||
'submitted_at', | ||
'latency' | ||
], | ||
) | ||
|
||
# DiscoverListContents adapted from aws_s3_interface.AwsS3ObjectsListContents | ||
DiscoverListContents = namedtuple( | ||
'DiscoverListContents', | ||
[ | ||
'prefix', | ||
'files_count', | ||
'files_meta', | ||
'obs_cycle_time', | ||
'submitted_at', | ||
'latency' | ||
], | ||
) | ||
|
||
HpssTarballContents = namedtuple( | ||
'HpssParsedTarballContents', | ||
[ | ||
'parent_dir', | ||
'expected_count', | ||
'inspected_files', | ||
'observation_day', | ||
'submitted_at', | ||
'latency' | ||
], | ||
) | ||
|
||
|
||
|
||
HpssFileMeta = namedtuple( | ||
'HpssFileMeta', | ||
[ | ||
'name', | ||
'permissions', | ||
'last_modified', | ||
'size' | ||
], | ||
) | ||
|
||
# DiscoverFileMeta adapted from aws_s3_interface.AwsS3FileMeta | ||
DiscoverFileMeta = namedtuple( | ||
'DiscoverFileMeta', | ||
[ | ||
'name', | ||
'permissions', | ||
'last_modified', | ||
'size', | ||
'etag' | ||
], | ||
) | ||
|
||
|
||
CMD_GET_DISCOVER_OBJ_LIST = 'list_discover' | ||
EXPECTED_COMPONENTS_DISCOVER_OBJ_LIST = 8 | ||
|
||
# inspect_discover_args_valid adapted from hpss_io_interface.inspect_tarball_args_valid() | ||
def inspect_discover_args_valid(args): | ||
if not isinstance(args, list): | ||
msg = f'Args must be in the form of a list, args: {args}' | ||
raise TypeError(msg) | ||
cmd = discover_cmds[CMD_GET_DISCOVER_OBJ_LIST].command | ||
print(f'{nl}{nl}In inspect tarball args valid: cmd: {cmd}{nl}{nl}') | ||
if (len(args) > 1 or len(args) == 0): | ||
msg = f'Command "{cmd}" accepts exactly 1 argument, received ' \ | ||
f'{len(args)}.' | ||
raise ValueError(msg) | ||
|
||
arg = args[0] | ||
|
||
try: | ||
m = re.search(r'[^A-Za-z0-9\._\-\/]', arg) | ||
if m is not None and m.group(0) is not None: | ||
print('Only a-z A-Z 0-9 and - . / _ characters allowed in filepath') | ||
raise ValueError(f'Invalid characters found in file path: {arg}') | ||
except Exception as e: | ||
raise ValueError(f'Invalid file path: {e}') | ||
|
||
return True | ||
|
||
|
||
# inspect_discover_parser adapted from hpss_io_interface.inspect_tarball_parser | ||
def inspect_discover_parser(response, obs_day): | ||
try: | ||
output = response.output.rsplit('\n') | ||
except Exception as e: | ||
raise ValueError('Problem parsing response.output. Error: {e}') | ||
|
||
files_meta = list() | ||
output_line = output[0] | ||
components = output_line.split() | ||
|
||
parent_dir = os.path.dirname(components[7]) | ||
expected_count = 1 | ||
fn = output[0].split("/")[-1] | ||
prefix = parent_dir | ||
obs_cycle_time = fn.split(".")[1:3] | ||
if len(obs_cycle_time[0]) == 6: | ||
obs_day = datetime.strptime('.'.join(obs_cycle_time), "%y%m%d.t%Hz") | ||
if len(obs_cycle_time[0]) == 8: | ||
obs_day = datetime.strptime('.'.join(obs_cycle_time), "%Y%m%d.t%Hz") | ||
permissions = '' | ||
size = int(components[4]) | ||
date_str = components[5] | ||
time_str = components[6] | ||
etag = '' | ||
filetime_str = f'{date_str} {time_str}' | ||
try: | ||
file_datetime = datetime.strptime(filetime_str, '%Y-%m-%d %H:%M') | ||
except Exception as e: | ||
msg = 'Problem parsing file timestamp: {filetime_str}, error: {e}' | ||
raise ValueError(msg) | ||
files_count = 1 | ||
files_meta.append( | ||
DiscoverFileMeta(fn, permissions, file_datetime, size, etag)) | ||
return DiscoverListContents( | ||
prefix, | ||
files_count, | ||
files_meta, | ||
obs_day, | ||
response.submitted_at, | ||
response.latency | ||
) | ||
|
||
# discover_cmds adapted from hpss_io_interface.hpss_cmds | ||
discover_cmds = { | ||
'list_discover': DiscoverCommand( | ||
['ls', '-l', '--time-style=long-iso'], | ||
inspect_discover_args_valid, | ||
inspect_discover_parser, | ||
|
||
) | ||
} | ||
|
||
# post_discover_cmd_result adapted from subprocess_cmd_handler.SubprocessCmdHandler.post_cmd_result | ||
def post_discover_cmd_result(raw_response, obs_day): | ||
if not isinstance(raw_response, DiscoverCommandRawResponse): | ||
msg = 'raw_response must be of type DiscoverCommandRawResponse. It is'\ | ||
f' actually of type: {type(raw_response)}' | ||
raise TypeError(msg) | ||
|
||
cmd_result_data = tbl_factory.CmdResultData( | ||
raw_response.command, | ||
raw_response.args_0, | ||
raw_response.output, | ||
raw_response.error, | ||
raw_response.return_code, | ||
obs_day, | ||
raw_response.submitted_at, | ||
raw_response.latency, | ||
datetime.utcnow() | ||
) | ||
|
||
cmd_result_id = tbl_factory.insert_cmd_result(cmd_result_data) | ||
|
||
return cmd_result_id | ||
|
||
|
||
# is_valid_discover_cmd adapted from hpss_io_interface.is_valid_hpss_cmd | ||
def is_valid_discover_cmd(instance, attribute, value): | ||
print(f'In is_valid_discover_cmd: value: {value}') | ||
if value not in discover_cmds: | ||
msg = f'Discover command {value} is not valid. Use one of: ' \ | ||
f'{discover_cmds.keys()}' | ||
raise KeyError(msg) | ||
return True | ||
|
||
|
||
# DiscoverCommandHandler adapted from hpss_io_interface.HpssCommandHandler | ||
@attr.s(slots=True) | ||
class DiscoverCommandHandler(object): | ||
|
||
command = attr.ib(validator=is_valid_discover_cmd) | ||
args = attr.ib(default=attr.Factory(list)) | ||
cmd_obj = attr.ib(init=False) | ||
cmd_line = attr.ib(init=False) | ||
raw_resp = attr.ib(default=None) | ||
submitted_at = attr.ib(default=None) | ||
finished_at = attr.ib(default=None) | ||
|
||
def __attrs_post_init__(self): | ||
self.cmd_obj = discover_cmds[self.command] | ||
print(f'In __attrs_post_init__: self.args: {self.args}') | ||
if self.cmd_obj.arg_validator(self.args): | ||
self.cmd_line = getattr(self.cmd_obj,'command').copy() | ||
for arg in self.args: | ||
self.cmd_line.append(arg) | ||
print(f'cmd_line: {self.cmd_line}, args: {self.args}') | ||
|
||
|
||
def send(self): | ||
cmd_str = self.cmd_obj.command[0] | ||
|
||
proc = subprocess.Popen( | ||
self.cmd_line, | ||
stdout=subprocess.PIPE, | ||
stderr=subprocess.PIPE | ||
) | ||
|
||
try: | ||
self.submitted_at = datetime.utcnow() | ||
out, err = proc.communicate() | ||
self.finished_at = datetime.utcnow() | ||
print(f'return_code: {proc.returncode}, out: {out}, err: {err}') | ||
except FileNotFoundError as e: | ||
msg = f'Command: {cmd_str} was not recognized. '\ | ||
f'error: {e}{nl}{nl}' | ||
raise FileNotFoundError(msg) | ||
except Exception as e: | ||
msg = f'Error after sending command {cmd_str}, error: {e}.' | ||
raise ValueError(msg) | ||
|
||
cmd_str = '' | ||
for cmd in self.cmd_obj.command: | ||
cmd_str += f'{cmd} ' | ||
|
||
self.raw_resp = DiscoverCommandRawResponse( | ||
cmd_str, | ||
proc.returncode, | ||
err.decode('utf-8'), | ||
out.decode('utf-8'), | ||
(proc.returncode == 0), | ||
self.args[0], | ||
self.submitted_at, | ||
float(self.get_cmd_duration()) | ||
) | ||
|
||
print(f'raw_resp: {self.raw_resp}') | ||
|
||
if proc.returncode != 0: | ||
return False | ||
else: | ||
return True | ||
|
||
|
||
def can_retry_send(self): | ||
# for now, we'll just send false for the retry until we | ||
# know what kind of erors we see back. | ||
return False | ||
|
||
|
||
def get_raw_response(self): | ||
return self.raw_resp | ||
|
||
|
||
def get_cmd_duration(self): | ||
diff = self.finished_at - self.submitted_at | ||
return (diff.seconds + diff.microseconds/1000000) | ||
|
||
|
||
def parse_response(self, obs_day): | ||
if self.raw_resp is not None: | ||
return self.cmd_obj.output_parser(self.raw_resp, obs_day) | ||
else: | ||
return None | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this actually getting used in this file or was it just there for reference making the discover ones? if just reference they can be removed now