-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with cftime coordinates on sequence_dim
#51
Comments
Okay, here is my cftime trouble example. This seems to be a tricky bug - which only shows up under certain conditions. For simplicity, I am considering the case of one chunk per netcdf file and only two netcdf files. The simplest case of a single netcdf file being converted into a single chunk of zarr is handled correctly with cftime. I needed to make an example with at least two netcdf files, even though the error shows up when storing the very first chunk/file. For my own sanity, I found out that I can make this notebook work properly, generating an acceptable zarr store, with a minor edit of @rabernat, I have also noticed that, despite an indication to the contrary when printing out the
there is no .zmetadata being saved - otherwize the zarr store looks fine. |
I have a similar issue in the GPM IMERG recipe. For me
I could share a notebook where it can be reproduced, but it needs my username/password and #59. |
@davidbrochart - do you have cftime installed? |
Naomi's full traceback is
|
Yes, and when I |
@naomi-henderson, the link to your 'cftime trouble example' above (https://github.com/naomi-henderson/cmip6collect2/blob/main/NZS-cftime.ipynb) appears to be broken. (I get a 404 error when clicking it.) Has this notebook been renamed or moved? Curious to look into this issue, and would be helpful to have the minimal example as reference. Thanks! |
@cisaacstern , that notebook was in reference to a very old version of the |
Naomi, I believe that everything needed to support your recipe is in place in current master branch. If you specify |
@cisaacstern and @rabernat , I used But there are some new issues:
My basic recipe:
Storage Targets:
Execute recipe:
|
Glad to hear that the initial For the sake of organization, perhaps this issue should be closed, and the 3 new issues @naomi-henderson discovered opened as distinct, new Issues? |
Thanks so much @naomi-henderson for trying this out! Issues 1 and 2 are very likely related to your environment. The intermittent hanging in 1 sounds a lot like fsspec/filesystem_spec#565; this was a bug in filesystem spec that was surfaced in part by our work on Pangeo Forge. It has been fixed in the latest fsspec master. It would be great if you could verify this. 2 is because we are now dependent on an as-of-yet unmerged xarray PR (pydata/xarray#5065) which adds the For development, you're probably best off creating a new environment that matches our CI, which should eliminate both problems: (I have switched from conda to mamba and am never going back.)
Thanks for checking this. You're absolutely right that I have not bothered to update the tutorials after some recent changes. However, I hope that "outdated in many ways" is an exaggeration; I have strived to keep the API the same. The biggest change, as you noted, is the use of context managers for all openers. This allows us to keep better track of open / closed file objects and is in line with python best practice. So instead of. ds = recipe.open_input(input_key) you do with recipe.open_input(input_key) as ds:
# do something with ds
display(ds)
# If you want it in memory outside of the context manager, do
ds.load()
# now the file is closed (Same for Thanks again for your helpful comments and real-world testing. Things are moving fast, so it's great to have this input. |
Thanks for this suggestion @cisaacstern. I think 1&2 are related (see comment above). 3 is about stale documentation. Having dedicated issues for these would be useful; however, depending on @naomi-henderson's response, they might be resolved very quickly, so possibly not needed... |
@rabernat, As usual, your comments are very clarifying, thanks! This old lady brain gets easily bogged down! Yes, "outdated in many ways" was an exaggeration, of course - that first example is very helpful! Now that I know about context managers I will go through it again and make a pull request with my suggestions. Okay, will use the new pangeo-forge environment - I had made a kernel with the old one and then just updated xarray, fsspec and pangeo-forge. I agree it is best to make a new kernel at this point. I will also give mamba a try because conda is taking way.... too.... long.... As for making a new tutorial example with CMIP6 - yes, I will give it a try. I am concerned about all of the moving parts, GFDL's AWS collection included, but will try to create something robust. |
@naomi-henderson - just checking whether things worked better with the updated environment? |
@rabernat - I have been visiting grandkids in Virginia, so just getting back to this now. I had updated the environment, but then had trouble with the 'https://aws-cloudnode.esgfed.org/thredds/dodsC/' OPenDAP server not working last week, so had been trying to use the s3:// urls directly - which have to be explicitly opened (unlike the gs:// urls). It was easy to switch to Have you decided to rework the recipe and pattern codes? If so, perhaps I should wait? EDIT: So I guess it is not a gs:// vs s3:// issue - it is a difference between the zarr stores in the s3://cmip6-pds bucket and netcdf files in the s3://esgf-world bucket. |
I'm so happy you're able to visit your grandkids! How exciting! 😊 Sorry about your environment troubles.
This doesn't make sense to me. s3fs and gcsfs should be interchangeable here and behave identically. Could you give a more verbose example of what you mean?
I am working on that, but it's mostly an internal refactor. No reason not to try. Anyway, nothing urgent here. Just checking in. |
@rabernat ,
and then, for the cmip6-pds bucket, xarray can open the zarr stores directly:
but, for the esgf-world bucket, xarray cannot use the
otherwise, just using
|
so in my test cases, I was using the opendap URLs to avoid this extra step ... |
The difference is between CDF and zarr, not s3fs and gcsfs. While I finally got my PR into xarray to use fsspec for the zarr backend, and interpret the path as necessary, this was not done for netCDF (et al) because:
Since the method of passing file-like objects works OK, making everything work has not been a high priority. |
thanks, @martindurant , yes - that makes sense. So I am passing file-like objects for the CDF files (or using their OPeNDAP server) for my normal workflow - just thought it would be cleaner for the tutorial to not have the extra step. I agree it is not high priority |
So far so good! All of my CMIP6 netcdf -> zarr test cases went through with no issues ( The time independent CMIP6 datasets are usually in a single netcdf file and are not very large, so we don't need to worry too much about the chunk sizes. The I will open pull requests for my suggested changes to the documentation and tutorials once I have had time to rework the first tutorial example and contribute a new CMIP6 specific example. |
In #47, @naomi-henderson reported that cftime-based time coordinates did not work with her recipe. (Details in this notebook.
The error occurs on
prepare_target
. Some relevant traceback is:Examining this closely, it looks like there is already a dataset in the target, but it can't be opened. This makes it hard to debug. The notebook has been run non-sequentially, making it hard to debug. @naomi-henderson, it would be great if you could turn this into a reproducible example we can use to get the the bottom of the cftime issue.
The text was updated successfully, but these errors were encountered: