Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: dataset over multiple NetCDF files. #461

Open
briochemc opened this issue Oct 25, 2024 · 4 comments
Open

Feature request: dataset over multiple NetCDF files. #461

briochemc opened this issue Oct 25, 2024 · 4 comments
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@briochemc
Copy link
Contributor

How does one open a dataset spread over multiple files (like with xarray's open_mfdataset function)?

@Balinus
Copy link
Contributor

Balinus commented Oct 25, 2024

There is an undocumented function:

open_mfdataset(g::AbstractString; kwargs...) = open_mfdataset(_glob(g); kwargs...)

and
function merge_datasets(dslist)

I do not remember exactly how to use it, something along the lines of:

YAXArrays.Datasets.open_mfdataset("/path/files") or perhaps YAXArrays.Datasets.open_mfdataset(list_of_files) ? or both

@lazarusA lazarusA added bug Something isn't working documentation Improvements or additions to documentation labels Oct 26, 2024
@Balinus
Copy link
Contributor

Balinus commented Nov 13, 2024

I had to reuse some code where mfdataset was used. In this case, it worked as expected.

image

@briochemc
Copy link
Contributor Author

Yes I've used it successfully too 😃

I think it ought to be documented though, right?

@Balinus
Copy link
Contributor

Balinus commented Nov 19, 2024

Is there a preprocessing option that can be added to open_mfdataset? I'd like to add a time dimension based on the filename for a suite of > 300 files that has only 2D longitude-latitude data.

for example, in Python I can do:

import xarray as xr
import numpy as np


def preprocessing(ds): 
    time_str = os.path.basename(ds.encoding['source']).split('_')[0:2]    # get year and month from filename
    time_str.append('01') # add a hardcoded day to list of time
    time_str =  "-".join(time_str) # create a single date string
    ds['time'] = np.datetime64(time_str, 'ns') # assign to dataset ds
    ds = ds.set_coords('time') # specify time as a coordinate
    return ds

ds_crps = xr.open_mfdataset(list_of_files, concat_dim='time', combine='nested', preprocess = preprocessing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants