Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supporting collection of local files #144

Closed
wants to merge 2 commits into from

Conversation

mariusaurus
Copy link
Contributor

Earth2Studio Pull Request

Description

This PR adds support for loading data arrays from local directory of monthly xr-readable files.
Other features include:

  • defining dtype in data/utils.py function datasource_to_file
  • adding async_timeout to arguments of ARCO data source, to enable downloading large chunks of data. Previous value of 10min kept as default.

script to test new data source:

from earth2studio.data import DataArrayDirectory, fetch_data
from earth2studio.models.px import SFNO
from earth2studio.utils.time import to_time_array

source_path = '/lustre/fs4/portfolios/coreai/users/mkoch/hens_ics/era5_arco'
times = ["2020-01-01", "2020-03-01"]
model = SFNO


times = to_time_array(times)
source = DataArrayDirectory(source_path)

package = model.load_default_package()
model = model.load_model(package=package)
inco = model.input_coords()

ics, coords = fetch_data(source=source,
            time=times,
            variable=inco["variable"],
            lead_time=inco["lead_time"],
            device="cpu",
        )

print(f"{coords['time']=}")
print(f"{coords['variable']=}")
print(f'{ics.shape=}')

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

None

@mariusaurus mariusaurus marked this pull request as ready for review October 1, 2024 14:12
@NickGeneva NickGeneva added the 1 - On Deck To be worked on next label Oct 8, 2024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DataArrayDirectory needs a unit test, see earth2studio/test/data/test_xr.py for examples.

@dallasfoster
Copy link
Collaborator

Closing due to replication with #151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - On Deck To be worked on next
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants