-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read ncml files to create multifile datasets #2697
Comments
+1 for adding this to xarray. |
Any updates regarding this? A while ago @rabernat mentioned that @dopplershift was potentially interested in working on implementing this feature in xarray in pangeo-data/esgf2xarray#1 (comment) I am interested in helping out with getting this feature in xarray. I tried finding Python tools that provide NcML functionality and the ones I found namely: seem to be outdated and unmaintained. In the meantime, I've been experimenting with some basics of NcML: https://nbviewer.jupyter.org/github/NCAR/xncml/blob/master/docs/source/tutorial.ipynb With guidance, input and feedback on what the API is expected to look like in xarray, I'd be more than happy to work on this moving forward |
I have not thought much about APIs yet. |
I'd like to revive this issue. @andersy005 In terms of API, I think the need is not so much to create or modify NcML files, but rather to return an The THREDDS repo contains a number of unit tests that could be emulated to steer the Python implementation. My understanding is that getting this done could involve a fair amount of work, so I'd like to see who's interested in collaborating on this and maybe schedule a meeting to plan work for this year or the next. |
Thanks for reviving this @huard! FWIW, I think it's best for this sort of utility to live in its own small standalone package, which I have referred to as "xarray-mergetool" in the past. NCML could be one special case of the things it could it. It would also be very useful for intake-esm. We have also discussed this in NCAR/esm-collection-spec#12 We should have some bandwidth to work on this over the next year via the pangeo-forge project. |
This just popped up in my inbox and reminded me of the conversation I had with @rabernat<https://github.com/rabernat> a few years back at a DRAKKAR meeting in France.
I haven't really kept up with things since then, but 6+ years ago we modified one of our python tools to abstract the IO method from the user by using NCML files as input. Then either the mfdataset or the unidata Java Netcdf library was used to access local or remote data (single file, directory or aggregation). As there wasn't any native NCML parser in python, and we had limited time, we ended up using pyjnius<https://github.com/kivy/pyjnius> to call the netcdf java class from python which gave us access to the directory scan, aggregation functions etc from the Java Library.... probably not the most efficient way - but we've been using it ever since. I don't have a huge amount of time (or expertise), but happy to get involved if I can.
…________________________________
From: Ryan Abernathey <notifications@github.com>
Sent: 03 September 2020 15:47
To: pydata/xarray <xarray@noreply.github.com>
Cc: Harle, James <jdha@noc.ac.uk>; Mention <mention@noreply.github.com>
Subject: Re: [pydata/xarray] read ncml files to create multifile datasets (#2697)
Thanks for reviving this @huard<https://github.com/huard>!
FWIW, I think it's best for this sort of utility to live in its own small standalone package, which I have referred to as "xarray-mergetool" in the past. NCML could be one special case of the things it could it. It would also be very useful for intake-esm.
We have also discussed this in NCAR/esm-collection-spec#12<NCAR/esm-collection-spec#12>
We should have some bandwidth to work on this over the next year via the pangeo-forge project.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#2697 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACN66WFQ43YO36IEE6NMMDDSD6UABANCNFSM4GRUVDBQ>.
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system.
The National Oceanography Centre (NOC) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. NOC does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Opinions, conclusions or other information in this message and attachments that are not related directly to NOC business are solely those of the author and do not represent the views of NOC.
|
It's worth pointing out that you can create FileReferenceSystem JSON to accomplish many of the tasks we used to use NcML for:
It also has the nice feature that it makes your dataset faster to work with on the cloud because the map to the data is loaded in one shot! |
I've got a first draft that parses an NcML document and spits out an It uses xsdata to parse the XML, using a datamodel automatically generated from the NcML 2-2 schema. I've scrapped test files from the netcdf-java repo to create a test suite. Wondering what's the best place to host the code, tests and test data so others may give it a spin ? |
Maybe a separate project in xarray-contrib would make sense?
I would be reluctant to add this into Xarray proper if we need a new
external dependency for reading XML files.
…On Wed, Jul 6, 2022 at 2:37 PM David Huard ***@***.***> wrote:
I've got a first draft that parses an NcML document and spits out an
xarray.Dataset. It does not cover all the NcML syntax, but the essential
elements are there.
It uses xsdata <https://xsdata.readthedocs.io/en/latest/> to parse the
XML, using a datamodel automatically generated from the NcML 2-2 schema.
I've scrapped test files from the netcdf-java
<https://github.com/Unidata/netcdf-java> repo to create a test suite.
Wondering what's the best place to host the code, tests and test data so
others may give it a spin ?
—
Reply to this email directly, view it on GitHub
<#2697 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVW32WV5YKZZP7KFVBTVSX4BZANCNFSM4GRUVDBQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Ok, another option would be to add that to xncml @andersy005 What do you think ? |
@huard, I haven't touched the codebase in that repo for three years 😃... So, I'm happy to transfer the xncml repo to xarray-contrib org and give you and anyone who wants access to it |
@andersy005 Sounds good ! |
Hi everyone, I've hit a problem where I need to read p/s xncml is broken at the moment. Thank you. |
I'd assume that pip install git+https://github.com/xarray-contrib/xncml.git to see if that gives you something to work with, otherwise I'd wait for any of the devs to get back to you (most likely in the issue you opened on the |
Thanks @keewis that's right, looks like they are still working on the |
That's right. I just did a quick 0.1 release of xncml, most likely rough around the edges. Give it a spin. PRs most welcome. @rabernat If you're happy with it, this issue can probably be closed. |
closing, since anything still missing should be feature requests for |
This issue was motivated by a recent conversation with @jdha regarding how they are preparing inputs for regional ocean models. They are currently using ncml with netcdf-java to consolidate and homogenize diverse data sources. But this approach doesn't play well with the xarray / dask stack.
ncml is standard developed by Unidata for use with their netCDF-java library:
In addition to describing individual netCDF files, ncml can be used to annotate modifications to netCDF metadata (attributes, dimension names, etc.) and also to aggregate multiple files into a single logical dataset. This is what such an aggregation over an existing dimension looks like in ncml:
Obviously this maps very well to xarray's
concat
operation. Similar aggregations can be defined that map tomerge
operations.I think it would be great if we could support the ncml spec in xarray, allowing us to write code like
This idea has been discussed before in #893. Perhaps it's time has finally come.
The text was updated successfully, but these errors were encountered: