Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Load diag data from s3 #104

Merged
merged 16 commits into from
Oct 26, 2022
Merged

Load diag data from s3 #104

merged 16 commits into from
Oct 26, 2022

Conversation

ian-noaa
Copy link
Collaborator

Add a protocol to the URIs in order to distinguish if the program should be looking on the local disk or on an S3 bucket for our data files.

Closes #80

netCDF4 1.6 and above are required to work on Apple Silicon Macs.
While we're at it, run poetry update to update the rest of our
dependencies.
We want to be able to distinguish between s3 and local files so have
decide to use URI's to indicate which storage backend to use. The S3
implementation is just a stub at this point but the file:// handling
should work.
@ian-noaa ian-noaa self-assigned this Oct 13, 2022
@ian-noaa ian-noaa linked an issue Oct 13, 2022 that may be closed by this pull request
Installing Poetry via pipx before installing Python 3.9 was resulting
in Poetry using the default version of Python (in this 3.8) instead of
the required Python 3.9. This was causing errors when using newer language
features.
@ian-noaa ian-noaa temporarily deployed to vlab October 14, 2022 01:29 Inactive
This reverts commit 43a0c46.

Turns out, we need poetry installed in order to use the cache when
setting up Python in later steps.
@ian-noaa ian-noaa temporarily deployed to vlab October 14, 2022 01:37 Inactive
Otherwise, Poetry uses the default Python 3.8 environment on the
runner.
@ian-noaa ian-noaa temporarily deployed to vlab October 14, 2022 01:47 Inactive
Each step runs in its own process in GitHub Actions so environment changes
are not preserved between steps. I may need to add this to the
`poetry run ...` commands as well. I hope it's unnecessary. Apparently
Poetry has some "well-known" issues with correctly detecting the
desired Python version. See:

* Poetry issue: python-poetry/poetry#655
* setup-python workaround: actions/setup-python#374 (comment)
@ian-noaa ian-noaa temporarily deployed to vlab October 14, 2022 02:03 Inactive
It sounds like Poetry will grab the Python env from the cache. The
cached python is incorrect so hopefully this will result in the
correct Python version.

See: actions/setup-python#374 (comment)
@ian-noaa ian-noaa temporarily deployed to vlab October 14, 2022 02:23 Inactive
@ian-noaa ian-noaa temporarily deployed to vlab October 14, 2022 02:23 Inactive
I ended up needing to use fsspec's simple cache to download the files to
local disk as xarray doesn't support reading netcdf directly from object
storage - only zarr formatted datasets.
@ian-noaa ian-noaa requested a review from esheehan-gsl October 25, 2022 19:15
@ian-noaa ian-noaa marked this pull request as ready for review October 25, 2022 19:15
Copy link
Contributor

@esheehan-gsl esheehan-gsl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks OK. If you want to merge this and swing back around for more tests, that seems reasonable. If this means we can actually run the demo from AWS instead of from a local copy, I’d love to do that.

# and a file that's not there. It raises a ValueError even for missing
# files. We raise a FileNotFoundError to make debugging easier.
if not bucket.exists(diag_file):
print(f"No such file: '{str(diag_file)}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m pretty sure we have logging configured, so this would be better as

Suggested change
print(f"No such file: '{str(diag_file)}'")
current_app.logger.error(f"No such file: '{str(diag_file)}'")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a great catch. Thanks! I'll go ahead and switch those to logging statements.

@ian-noaa ian-noaa temporarily deployed to vlab October 25, 2022 21:42 Inactive
We want the app in a demo-able state so we'll work on testing the S3
functionality in a new PR as it will require some more involved test
setup.
@ian-noaa ian-noaa temporarily deployed to vlab October 25, 2022 21:56 Inactive
@ian-noaa ian-noaa temporarily deployed to vlab October 25, 2022 21:57 Inactive
@ian-noaa ian-noaa requested a review from esheehan-gsl October 25, 2022 22:04
@ian-noaa
Copy link
Collaborator Author

Thanks, Evan. I've gone ahead and updated the bare print() statements to logger statements. I've also resolved a merge conflict and ended up disabling the pytest I had for S3. I'll work on addressing the tests for that in my next PR.

This should let us run the demo from AWS instead of a local version of the site. I've been chatting with the VLab folks to set up permission for the API pod to access our S3 bucket in AWS.

Copy link
Contributor

@esheehan-gsl esheehan-gsl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍🏻

@github-actions
Copy link

Code Coverage

Package Line Rate Health
src.unified_graphics 92%
tests 96%
Summary 94% (241 / 256)

Minimum allowed line rate is 60%

@github-actions
Copy link

Code Coverage

Package Line Rate Branch Rate Health
js 23% 70%
js.components.Chart2DHistogram 0% 0%
js.components.ChartContainer 0% 0%
js.components.ChartElement 0% 0%
js.components.ChartHistogram 0% 0%
js.components.ChartMap 0% 0%
js.components.DiagnosticView 0% 0%
js.components.Header 0% 0%
Summary 2% (33 / 1649) 26% (7 / 27)

Minimum allowed line rate is 0%

@ian-noaa ian-noaa temporarily deployed to vlab October 26, 2022 15:24 Inactive
@ian-noaa ian-noaa temporarily deployed to vlab October 26, 2022 15:25 Inactive
@ian-noaa ian-noaa merged commit 2351766 into main Oct 26, 2022
@ian-noaa ian-noaa deleted the 80-load-diag-data-from-s3 branch October 26, 2022 15:44
@ian-noaa ian-noaa temporarily deployed to vlab October 26, 2022 15:44 Inactive
@ian-noaa ian-noaa temporarily deployed to vlab October 26, 2022 15:44 Inactive
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load diag data from S3
2 participants